By Adam Marcus, Unlimited Labs, USA, marcua@marcua.net | Aditya Parameswaran, University of Illinois at Urbana-Champaign, USA, adityagp@illinois.edu
Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible for fully automated techniques to complete, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon’s Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video/audio/image processing. Several academic researchers from diverse areas ranging from the social sciences to computer science have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, or cost. Given the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences. In this monograph, we aim to narrow the gap between academics and practitioners. On the academic side, we summarize the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. On the industry side, we survey 13 industry users (e.g., Google, Facebook, Microsoft) and 4 marketplace providers of crowd work (e.g., CrowdFlower, Upwork) to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions. Through the monograph, we hope to simultaneously introduce academics to real problems that practitioners encounter every day, and provide a survey of the state of the art for practitioners to incorporate into their designs. Through our surveys, we also highlight the fact that crowdpowered data processing is a large and growing field. Over the next decade, we believe that most technical organizations will in some way benefit from crowd work, and hope that this monograph can help guide the effective adoption of crowdsourcing across these organizations.
Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible for fully automated techniques to complete, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon’s Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video/audio/image processing. Several academic researchers from diverse areas, ranging from the social sciences to computer science, have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, and cost. Despite the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences.
Crowdsourced Data Management: Industry and Academic Perspectives aims to narrow the gap between academics and practitioners. On the academic side, it summarizes the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. On the industry side, it surveys 13 industry users – such as Google, Facebook, and Microsoft – and four marketplace providers of crowd work – such as CrowdFlower and Upwork – to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions.
Crowdsourced Data Management: Industry and Academic Perspectives simultaneously introduces academics to real problems that practitioners encounter every day, and provides a survey of the state of the art for practitioners to incorporate into their designs. Through the surveys, it also highlights the fact that crowdpowered data processing is a large and growing field. Over the next decade, most technical organizations are likely to benefit in some way from crowd work, and this monograph can help guide the effective adoption of crowdsourcing across these organizations.