Foundations and Trends® in Databases > Vol 6 > Issue 1-2

Crowdsourced Data Management: Industry and Academic Perspectives

By Adam Marcus, Unlimited Labs, USA, marcua@marcua.net | Aditya Parameswaran, University of Illinois at Urbana-Champaign, USA, adityagp@illinois.edu

 
Suggested Citation
Adam Marcus and Aditya Parameswaran (2015), "Crowdsourced Data Management: Industry and Academic Perspectives", Foundations and Trends® in Databases: Vol. 6: No. 1-2, pp 1-161. http://dx.doi.org/10.1561/1900000044

Publication Date: 22 Dec 2015
© 2015 A. Marcus and A. Parameswaran
 
Subjects
Approximate and Interactive Query Processing,  Data Cleaning and Information Extraction,  Data Models and Query Languages,  Probabilistic Data Management,  Query Processing and Optimization,  Computer Supported Cooperative Work :Organizational issues,  Computer Supported Cooperative Work :Communication technologies,  Information extraction,  Topic detection and tracking,  Usability, interactivity, and visualization issues in IR,  Classification and prediction,  Collective Intelligence
 
Keywords
Crowdsourcing
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Related Work
3. An Overview of Crowd-Powered Algorithms
4. An Overview of Crowd-Powered Systems
5. Survey of Industry Users: Summary and Methodology
6. Survey of Industry Users: Crowd Statistics and Management
7. Survey of Industry Users: Use cases and Prior Approaches
8. Survey of Industry Users: Task Quality, Worker Incentives, and Workflow Decomposition
9. Survey of Marketplace Providers of Crowdsourcing
10. Conclusion
Acknowledgements
References

Abstract

Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible for fully automated techniques to complete, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon’s Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video/audio/image processing. Several academic researchers from diverse areas ranging from the social sciences to computer science have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, or cost. Given the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences. In this monograph, we aim to narrow the gap between academics and practitioners. On the academic side, we summarize the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. On the industry side, we survey 13 industry users (e.g., Google, Facebook, Microsoft) and 4 marketplace providers of crowd work (e.g., CrowdFlower, Upwork) to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions. Through the monograph, we hope to simultaneously introduce academics to real problems that practitioners encounter every day, and provide a survey of the state of the art for practitioners to incorporate into their designs. Through our surveys, we also highlight the fact that crowdpowered data processing is a large and growing field. Over the next decade, we believe that most technical organizations will in some way benefit from crowd work, and hope that this monograph can help guide the effective adoption of crowdsourcing across these organizations.

DOI:10.1561/1900000044
ISBN: 978-1-68083-090-3
184 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-68083-091-0
184 pp. $250.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Related Work
3. An Overview of Crowd-Powered Algorithms
4. An Overview of Crowd-Powered Systems
5. Survey of Industry Users: Summary and Methodology
6. Survey of Industry Users: Crowd Statistics and Management
7. Survey of Industry Users: Use cases and Prior Approaches
8. Survey of Industry Users: Task Quality, Worker Incentives, and Workflow Decomposition
9. Survey of Marketplace Providers of Crowdsourcing
10. Conclusion
Acknowledgements
References

Crowdsourced Data Management

Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible for fully automated techniques to complete, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon’s Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video/audio/image processing. Several academic researchers from diverse areas, ranging from the social sciences to computer science, have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, and cost. Despite the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences.

Crowdsourced Data Management: Industry and Academic Perspectives aims to narrow the gap between academics and practitioners. On the academic side, it summarizes the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. On the industry side, it surveys 13 industry users – such as Google, Facebook, and Microsoft – and four marketplace providers of crowd work – such as CrowdFlower and Upwork – to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions.

Crowdsourced Data Management: Industry and Academic Perspectives simultaneously introduces academics to real problems that practitioners encounter every day, and provides a survey of the state of the art for practitioners to incorporate into their designs. Through the surveys, it also highlights the fact that crowdpowered data processing is a large and growing field. Over the next decade, most technical organizations are likely to benefit in some way from crowd work, and this monograph can help guide the effective adoption of crowdsourcing across these organizations.

 
DBS-044