By Faisal Nawab, University of California, Irvine, USA, nawabf@uci.edu | Mohammad Sadoghi, University of California, Davis, USA, msadoghi@ucdavis.edu
The problem of distributed consensus has played a major role in the development of distributed data management systems. This includes the development of distributed atomic commit and replication protocols. In this monograph, we present foundations of consensus protocols and the ways they were utilized to solve distributed data management problems. Also, we discuss how distributed consensus contributes to the development of emerging blockchain systems. This includes an exploration of consensus protocols and their use in systems with malicious actors and arbitrary faults.
Our approach is to start with the basics of representative consensus protocols where we start from classic consensus protocols and show how they can be extended to support better performance, extended features, and/or adapt to different system models. Then, we show how consensus can be utilized as a tool in the development of distributed data management. For each data management problem, we start by showing a basic solution to the problem and highlighting its shortcomings that invites the utilization of consensus. Then, we demonstrate the integration of consensus to overcome these shortcomings and provide desired design features. We provide examples of each type of integration of consensus in distributed data management as well as an analysis of the integration and its implications.
Consensus is the problem of making distributed nodes reach agreement. It is a basic building block that can be used in more complex distributed data management systems while retaining correctness guarantees of the state of the data and its recovery. Solving the intricacies of distributed coordination, network uncertainties, and failures in such complex data management problems is a daunting challenge. This has led many systems designers to utilize consensus as a tool to build more complex distributed protocols. Consensus has thus influenced data management systems and research for many decades.
This monograph provides a foundation for the reader to understand the use of consensus protocols in data management systems and aims to empower data management researchers and practitioners to pursue work that utilizes and innovates consensus for their data management applications. It presents the foundations of consensus and consensus in data management by pointing out work that has been influential or representative of the data management areas the authors explore. They start with an introduction to the principles of consensus and then present background on the use of consensus in data management. They show how consensus is used for the distributed atomic commit problem and how it is used in replication protocols where data copies are distributed across different nodes. They further expand the scope of the crash-tolerant commit protocols to handle arbitrary failures by exploring the seminal fault-tolerant consensus protocol known as Practical Byzantine Fault Tolerance (Pbft). For each data management problem, the authors present a basic solution and highlight its shortcomings that invites the utilization of consensus. They then demonstrate the integration of consensus to overcome these shortcomings and provide desired design features, providing examples of each type of integration of consensus as well as an analysis of the integration and its implications. The monograph concludes with a summary and a discussion of future directions.