Foundations and Trends® in Databases > Vol 1 > Issue 4

Provenance in Databases: Why, How, and Where

By James Cheney, University of Edinburgh, UK, jcheney@inf.ed.ac.uk | Laura Chiticariu, IBM Almaden Research Center, USA, chiti@almaden.ibm.com | Wang-Chiew Tan, University of California, USA, wctan@cs.ucsc.edu

 
Suggested Citation
James Cheney, Laura Chiticariu and Wang-Chiew Tan (2009), "Provenance in Databases: Why, How, and Where", Foundations and TrendsĀ® in Databases: Vol. 1: No. 4, pp 379-474. http://dx.doi.org/10.1561/1900000006

Publication Date: 02 Jun 2009
© 2009 J. Cheney, L. Chiticariu and W.-C. Tan
 
Subjects
Private and Secure Data Management
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2. Why-Provenance 
3. How-Provenance 
4. Where-Provenance 
5. Comparing Models of Provenance 
6. Conclusions 
Acknowledgments 
References 

Abstract

Different notions of provenance for database queries have been proposed and studied in the past few years. In this article, we detail three main notions of database provenance, some of their applications, and compare and contrast amongst them. Specifically, we review why, how, and where provenance, describe the relationships among these notions of provenance, and describe some of their applications in confidence computation, view maintenance and update, debugging, and annotation propagation.

DOI:10.1561/1900000006
ISBN: 978-1-60198-232-2
100 pp. $75.00
Buy book (pb)
 
ISBN: 978-1-60198-233-9
100 pp. $100.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Why-Provenance
3. How-Provenance
4. Where-Provenance
5. Comparing Models of Provenance
6. Conclusions
Acknowledgements
References

Provenance in Databases

In September 2008, Google News promoted an undated article about United Airlines' near bankruptcy in 2002. In the ensuing panic, the share price of United Airlines dropped by around 75% in a few hours. This problem was due in part to the fact that the article lacked provenance that readers could have used to determine that it was out of date. In an increasingly networked world, understanding of provenance is essential for establishing trust in data stored in databases and exchanged among Web sites. It is also critical to the process of making key business, scientific, and governmental decisions. Modern database systems are capable of producing answers efficiently. However, they are generally lacking capabilities to explain provenance such as why and how the answers were produced, or where the data in the result came from. In recent years, different notions of provenance for database queries have been studied by the authors and a growing community of researchers in databases and scientific computation.

Provenance in Databases reviews research over the past ten years on why, how, and where provenance, clarifies the relationships among these notions of provenance, and describes some of their applications in confidence computation, view maintenance and update, debugging, and annotation propagation. Provenance in Databases is intended for engineers and researchers who would like to familiarize themselves with the foundations, as well as the many challenges in the field of database provenance.

 
DBS-006