Hashing, Load Balancing and Multiple Choice

Udi Wieder

doi:10.1561/0400000070

Foundations and Trends® in Theoretical Computer Science > Vol 12 > Issue 3–4

Hashing, Load Balancing and Multiple Choice

By Udi Wieder, VMware Research, USA, udi.wieder@gmail.com

Suggested Citation

Udi Wieder (2017), "Hashing, Load Balancing and Multiple Choice", Foundations and Trends® in Theoretical Computer Science: Vol. 12: No. 3–4, pp 275-379. http://dx.doi.org/10.1561/0400000070

Publication Date: 11 Jul 2017

Subjects

Private and Secure Data Management, Theory, Optimization, Data structures, Design and analysis of algorithms, Stochastic Optimization

Journal details

Download article

In this article:

Abstract

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and furthermore, given an item’s identifier it is possible to determine quickly to which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets compute devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values. In this monograph we survey some of these techniques. We focus on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. We analyze the distributions obtained in detail, and show how these ideas could be used to design basic data structures. With respect to data structures we focus on dictionaries, presenting linear probing, cuckoo hashing and many of their variants.

DOI:10.1561/0400000070

Book details

ISBN: 978-1-68083-282-2

120 pp. $85.00

Buy book (pb)

ISBN: 978-1-68083-283-9

120 pp. $260.00

Buy E-book (.pdf)

Table of contents:

1. Introduction

2. Simple Hashing - the One Choice Scheme

3. Multiple Choice Schemes

4. The Heavily Loaded Case

5. Dictionaries

Acknowledgments

References

Scalable Algorithms for Data and Network Analysis

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and, furthermore, given an item’s identifier it is possible to determine quickly to which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets compute devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values.

Hashing, Load Balancing and Multiple Choice presents some of the basic algorithmic ideas that underpin many of the practical and theoretically interesting approaches for this problem. It focuses on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. It analyses the distributions obtained, and shows how these ideas could be used to design basic data structures. With respect to data structures it focuses on dictionaries, presenting linear probing, cuckoo hashing and many of their variants.

Hashing, Load Balancing and Multiple Choice

Free Preview:

Share

Journal details

Abstract

Book details

Scalable Algorithms for Data and Network Analysis