By Gerhard Weikum, Max Planck Institute for Informatics, Germany, weikum@mpi-inf.mpg.de | Xin Luna Dong, Amazon, USA, lunadong@amazon.com | Simon Razniewski, Max Planck Institute for Informatics, Germany, srazniew@mpi-inf.mpg.de | Fabian Suchanek, Telecom Paris University, France, suchanek@telecom-paris.fr
Equipping machines with comprehensive knowledge of the world’s entities and their relationships has been a longstanding goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics.
This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.
Equipping machines with comprehensive knowledge of the world’s entities and their relationships has been a longstanding goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics.
This monograph surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and curating large knowledge bases from online content, with emphasis on semi-structured web pages with lists, tables etc., and unstructured text sources. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.
The intended audience is students and researchers interested in a wide spectrum of topics: from machine knowledge and data quality to machine learning and data science as well as applications in web content mining and natural language understanding. It will also be of interest to industrial practitioners working on semantic technologies for web, social media, or enterprise content.