Foundations and Trends® in Databases > Vol 2 > Issue 1–2

Privacy-Preserving Data Publishing

By Bee-Chung Chen, Yahoo! Research, USA, beechun@yahoo-inc.com | Daniel Kifer, Penn State University, USA, dkifer@cse.psu.edu | Kristen LeFevre, University of Michigan, USA, klefevre@eecs.umich.edu | Ashwin Machanavajjhala, Yahoo! Research, USA, mvnak@yahoo-inc.com

 
Suggested Citation
Bee-Chung Chen, Daniel Kifer, Kristen LeFevre and Ashwin Machanavajjhala (2009), "Privacy-Preserving Data Publishing", Foundations and Trends® in Databases: Vol. 2: No. 1–2, pp 1-167. http://dx.doi.org/10.1561/1900000008

Publication Date: 20 Oct 2009
© 2009 B.-C. Chen, D. Kifer, K. LeFevre and A. Machanavajjhala
 
Subjects
Private and Secure Data Management
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2. Privacy Definitions 
3. Utility Metrics 
4. Mechanisms and Algorithms 
5. Using Sanitized Data 
6. Attacking Sanitized Data 
7. Challenges and Emerging Applications 
8. Conclusions 
Acknowledgments 
References 

Abstract

Privacy is an important issue when one wants to make use of data that involves individuals' sensitive information. Research on protecting the privacy of individuals and the confidentiality of data has received contributions from many fields, including computer science, statistics, economics, and social science. In this paper, we survey research work in privacy-preserving data publishing. This is an area that attempts to answer the problem of how an organization, such as a hospital, government agency, or insurance company, can release data to the public without violating the confidentiality of personal information. We focus on privacy criteria that provide formal safety guarantees, present algorithms that sanitize data to make it safe for release while preserving useful information, and discuss ways of analyzing the sanitized data. Many challenges still remain. This survey provides a summary of the current state-of-the-art, based on which we expect to see advances in years to come.

DOI:10.1561/1900000008
ISBN: 978-1-60198-276-6
180 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-60198-277-3
180 pp. $150.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Privacy Definitions
3. Utility Metrics
4. Mechanisms and Algorithms
5. Using Sanitized Data
6. Attacking Sanitized Data
7. Challenges and Emerging Applications
8. Conclusions
Acknowledgments
References

Privacy-Preserving Data Publishing

This monograph is dedicated to those who have something to hide. It is a book about "privacy preserving data publishing" – the art of publishing sensitive personal data, collected from a group of individuals, in a form that does not violate their privacy. This problem has numerous and diverse areas of application, including releasing Census data, search logs, medical records, and interactions on a social network.

The purpose of this monograph is to provide a detailed overview of the current state of the art as well as open challenges, focusing particular attention on four key themes:

  • RIGOROUS PRIVACY POLICIES: Repeated and highly-publicized attacks on published data have demonstrated that simplistic approaches to data publishing do not work. Significant recent advances have exposed the shortcomings of naive (and not-so-naive) techniques. They have also led to the development of mathematically rigorous definitions of privacy that publishing techniques must satisfy.
  • METRICS FOR DATA UTILITY: While it is necessary to enforce stringent privacy policies, it is equally important to ensure that the published version of the data is useful for its intended purpose. The authors provide an overview of diverse approaches to measuring data utility.
  • ENFORCEMENT MECHANISMS: This book describes in detail various key data publishing mechanisms that guarantee privacy a nd utility.
  • EMERGING APPLICATIONS: The problem of privacy-preserving data publishing arises in diverse application domains with unique privacy and utility requirements. The authors elaborate on the merits and limitations of existing solutions, based on which we expect to see many advances in years to come.

 
DBS-008

Comment on Section 7.4.1, Page 147, paragraph 2

|

Commentary Submitted By: Daniel Kifer, Penn State University, dkifer@cse.psu.edu. Date Accepted: 28/10/2010

  • Description: In reference [121], Ghinita et al. do not publish low-dimensional approximations. They generate a multidimensional anonymization of the data by reducing the problem to a 1-D anonymization problem via space filling curves.