By Sungjin Im, University of California, Merced, USA, sim3@ucmerced.edu | Ravi Kumar, Google, Mountain View, USA, ravi.k53@gmail.com | Silvio Lattanzi, Google, Barcelona, Spain, silviol@google.com | Benjamin Moseley, Carnegie Mellon University, USA, moseleyb@andrew.cmu.edu | Sergei Vassilvitskii, Google, New York, USA, sergeiv@google.com
The algorithms community has been modeling the underlying key features and constraints of massively parallel frameworks and using these models to discover new algorithmic techniques tailored to them. This monograph focuses on the Massively Parallel Model of Computation (MPC) framework, also known as the MapReduce model in the literature. It describes algorithmic tools that have been developed to leverage the unique features of the MPC framework. These tools were chosen for their broad applicability, as they can serve as building blocks to design new algorithms. The monograph is not exhaustive and includes topics such as partitioning and coresets, sample and prune, dynamic programming, round compression, and lower bounds.
The modern era is witnessing a revolution in the ability to scale computations to massively large data sets. A key breakthrough in scalability was the introduction of fast and easy-to-use distributed programming models such as the Massively Parallel Model of Computation (MPC) framework (also known as MapReduce). The framework describes algorithmic tools that have been developed to leverage the unique features of the MPC framework. These tools were chosen for their broad applicability, as they can serve as building blocks to design new algorithms.
In this monograph the authors describe in detail certain tools available in the framework that are generally applicable and can be used as building blocks to design algorithms in the area. These include Partitioning and Coresets, sample and prune, dynamic programming, round compression, and lower bounds.
This monograph provides the reader with an accessible introduction to the most important tools of a framework used for the design of new algorithms deployed in systems using massively parallel computation.