By Aleksandrs Slivkins , Microsoft Research NYC, USA, slivkins@microsoft.com
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a brief review of the further developments.
This book gives a broad and accessible introduction to multi-armed bandits, a rich, multi-disciplinary area of increasing importance. The material is teachable by design: each chapter corresponds to one week of a course. There are no prerequisites other than a certain level of mathematical maturity, roughly corresponding to the basic undergraduate course on algorithms.
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous, multi-dimensional body of work has accumulated over the years.
How to present this work, let alone make it teachable? The book partitions it into a dozen or so big directions. Each chapter handles one direction, covers the first-order concepts and results on a technical level, and provides a detailed literature review for further exploration. While most of the book is on learning theory, the last three chapters cover various connections to economics and operations research.
The book aims to convey that multi-armed bandits are both deeply theoretical and deeply practical. Apart from all the math, the book is careful about motivation, and discusses the practical aspects in considerable detail (based on the system for contextual bandits developed at Microsoft Research).
Lecturers can use this book for an introductory course on the subject. Such course would be complementary to graduate-level courses on online convex optimization and reinforcement learning.