Foundations and Trends® in Machine Learning > Vol 18 > Issue 1

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

By Fredrik Hellström, University College London, UK, f.hellstrom@ucl.ac.uk | Giuseppe Durisi, Chalmers University of Technology, Sweden, durisi@chalmers.se | Benjamin Guedj, Inria, France and University College London, UK, benjamin.guedj@inria.fr | Maxim Raginsky, University of Illinois, USA, maxim@illinois.edu

 
Suggested Citation
Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj and Maxim Raginsky (2025), "Generalization Bounds: Perspectives from Information Theory and PAC-Bayes", Foundations and Trends® in Machine Learning: Vol. 18: No. 1, pp 1-223. http://dx.doi.org/10.1561/2200000112

Publication Date: 23 Jan 2025
© 2025 F. Hellström et al.
 
Subjects
Information theory and computer science,  Information theory and statistics,  Pattern recognition and learning,  Learning and statistical methods,  Statistical/Machine learning,  Statistical learning theory,  Deep learning,  Classification and prediction,  Reinforcement learning,  Design and analysis of algorithms
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction: On Generalization and Learning
2. Information-Theoretic Approach to Generalization
3. Tools
4. Generalization Bounds in Expectation
5. Generalization Bounds in Probability
6. The CMI Framework
7. The Information Complexity of Learning Algorithms
8. Neural Networks and Iterative Algorithms
9. Alternative Learning Models
10. Concluding Remarks
Acknowledgements
References

Abstract

A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands.

In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information- theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework, analytical studies of the information complexity of learning algorithms, and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.

DOI:10.1561/2200000112
ISBN: 978-1-63828-420-8
242 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-421-5
242 pp. $160.00
Buy E-book (.pdf)
Table of contents:
1. Introduction: On Generalization and Learning
2. Information-Theoretic Approach to Generalization
3. Tools
4. Generalization Bounds in Expectation
5. Generalization Bounds in Probability
6. The CMI Framework
7. The Information Complexity of Learning Algorithms
8. Neural Networks and Iterative Algorithms
9. Alternative Learning Models
10. Concluding Remarks
Acknowledgements
References

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Artificial intelligence and machine learning have emerged as driving forces behind transformative advancements in various fields, and have become increasingly pervasive in many industries and daily life. As these technologies continue to gain momentum, so does the need to develop a deeper understanding of their underlying principles, capabilities, and limitations. In this monograph, the authors focus on the theory of machine learning and statistical learning theory, with a particular focus on the generalization capabilities of learning algorithms.

Part I covers the foundations of information-theoretic and PAC-Bayesian generalization bounds for standard supervised learning. Part II explores the applications of generalization bounds, as well as extensions to settings beyond standard supervised learning. Several important areas of application include neural networks, federated learning and reinforcement learning. The monograph concludes with a broader discussion of information-theoretic and PAC-Bayesian generalization bounds as a whole.

This monograph will be of interest to students and researchers working in generalization and theoretical machine learning. It provides a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible.

 
MAL-112