Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Gerhard Weikum; Xin Luna Dong; Simon Razniewski; Fabian Suchanek

doi:10.1561/1900000064

Book details

ISBN: 978-1-68083-836-7

400 pp. $99.00

Buy book (pb)

ISBN: 978-1-68083-837-4

400 pp. $280.00

Buy E-book (.pdf)

Table of contents:

1. What Is This All About

2. Foundations and Architecture

3. Knowledge Integration from Premium Sources

4. KB Construction: Entity Discovery and Typing

5. Entity Canonicalization

6. KB Construction: Attributes and Relationships

7. Open Schema Construction

8. Knowledge Base Curation

9. Case Studies

10. Wrap-Up

Acknowledgements

References

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Equipping machines with comprehensive knowledge of the world’s entities and their relationships has been a longstanding goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics.

This monograph surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and curating large knowledge bases from online content, with emphasis on semi-structured web pages with lists, tables etc., and unstructured text sources. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

The intended audience is students and researchers interested in a wide spectrum of topics: from machine knowledge and data quality to machine learning and data science as well as applications in web content mining and natural language understanding. It will also be of interest to industrial practitioners working on semantic technologies for web, social media, or enterprise content.