By Pierre Alquier, ESSEC Business School, Asia-Pacific Campus, Singapore, pierre.alquier.stat@gmail.com
Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution.
Thus, aggregated and randomized predictors have in common that their definition rely on a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such predictors: PAC-Bayesian or PAC-Bayes bounds.
Since the original PAC-Bayes bounds (Shawe-Taylor and Williamson, 1997; McAllester, 1998), these tools have been considerably improved in many directions. We will for example describe a simplified version of the localization technique (Catoni, 2003; Catoni, 2007) that was missed by the community, and later rediscovered as “mutual information bounds”. Very recently, PAC-Bayes bounds received a considerable attention. There was workshop on PAC-Bayes at NIPS 2017, (Almost) 50 Shades of Bayesian Learning: PACBayesian trends and insights, organized by B. Guedj, F. Bach and P. Germain. One of the reasons of this recent interest is the successful application of these bounds to neural networks (Dziugaite and Roy, 2017). Since then, this is a recurring topic of workshops in the major machine learning conferences.
The objective of these notes is to provide an elementary introduction to PAC-Bayes bounds.
Probably approximately correct (PAC) bounds have been an intensive field of research over the last two decades. Hundreds of papers have been published and much progress has been made resulting in PAC-Bayes bounds becoming an important technique in machine learning.
The proliferation of research has made the field for a newcomer somewhat daunting. In this tutorial, the author guides the reader through the topic’s complexity and large body of publications. Covering both empirical and oracle PAC-bounds, this book serves as a primer for students and researchers who want to get to grips quickly with the subject. It provides a friendly introduction that illuminates the basic theory and points to the most important publications to gain deeper understanding of any particular aspect.