Foundations and Trends® in Databases > Vol 12 > Issue 1

Modern Datalog Engines

By Bas Ketsman, Vrije Universiteit Brussel, Belgium, bas.ketsman@vub.be | Paraschos Koutris, University of Wisconsin-Madison, USA, paris@cs.wisc.edu

 
Suggested Citation
Bas Ketsman and Paraschos Koutris (2022), "Modern Datalog Engines", Foundations and TrendsĀ® in Databases: Vol. 12: No. 1, pp 1-68. http://dx.doi.org/10.1561/1900000073

Publication Date: 29 Jun 2022
© 2022 B. Ketsman and P. Koutris
 
Subjects
Data Models and Query Languages,  Database Theory,  Parallel and Distributed Database Systems,  Query Processing and Optimization
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. The Datalog Language
3. Evaluation
4. Data Layouts and Indices
5. Optimizations
6. Conclusion
References

Abstract

Recent years have seen a resurgence of interest from both the industry and research community in Datalog. Datalog is a declarative query language that extends relational algebra with recursion. It has been used to express a wide spectrum of modern data management tasks, such as data integration, declarative networking, graph analysis, business analytics, and program analysis. The result of this long line of research is a plethora of Datalog engines, which support different variants of Datalog, and have different technical specifications and capabilities. In this monograph, we provide an overview of the architecture and technical characteristics of these Datalog engines. We identify common architectural decisions and evaluation methods, as well as data structures and layouts used to speed up the query execution. We also discuss in what ways Datalog engines differ when they specialize to workloads with different characteristics (for example, data analytics vs program analysis vs graph analysis). One particular focus is how modern Datalog engines scale to massively parallel environments.

DOI:10.1561/1900000073
ISBN: 978-1-63828-042-2
72 pp. $65.00
Buy book (pb)
 
ISBN: 978-1-63828-043-9
72 pp. $145.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. The Datalog Language
3. Evaluation
4. Data Layouts and Indices
5. Optimizations
6. Conclusion
References

Modern Datalog Engines

Recent years have seen a resurgence of interest in Datalog from both the industry and research community. Datalog is a declarative query language that extends relational algebra with recursion. It is used to express a wide spectrum of modern data management tasks such as data integration, declarative networking, graph analysis, business analytics, and program analysis. The result of this long line of research is a plethora of Datalog engines that support different variants of Datalog, and have different technical specifications and capabilities.

In this monograph, the authors provide an overview of the architecture and technical characteristics of the various Datalog engines. They identify common architectural decisions and evaluation methods as well as data structures and layouts used to speed up the query execution. They also discuss the ways in which Datalog engines differ when they specialize to workloads with different characteristics. A particular focus of this monograph is how modern Datalog engines scale to massively parallel environments, which is necessary to support the processing of very large datasets. The authors conclude with opportunities for future research directions and new possible applications for Datalog engines.

 
DBS-073