Foundations and Trends® in Machine Learning > Vol 7 > Issue 2-3

Theory of Disagreement-Based Active Learning

By Steve Hanneke, , steve.hanneke@gmail.com

 
Suggested Citation
Steve Hanneke (2014), "Theory of Disagreement-Based Active Learning", Foundations and TrendsĀ® in Machine Learning: Vol. 7: No. 2-3, pp 131-309. http://dx.doi.org/10.1561/2200000037

Publication Date: 12 Jun 2014
© 2014 S. Hanneke
 
Subjects
Statistical Learning Theory,  Classification and Prediction,  Computational Learning
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2. Basic Definitions and Notation 
3. A Brief Review of Passive Learning 
4. Lower Bounds on the Label Complexity 
5. Disagreement-Based Active Learning 
6. Computational Efficiency via Surrogate Losses 
7. Bounding the Disagreement Coefficient 
8. A Survey of Other Topics and Techniques 
References 

Abstract

Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learning is to produce a highly-accurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same. This article describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the article focuses on a particular technique, namely disagreement-based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing on results that illustrate fundamental ideas, rather than obtaining the strongest or most general known theorems. The intended audience includes researchers and advanced graduate students in machine learning and statistics, interested in gaining a deeper understanding of the recent and ongoing developments in the theory of active learning.

DOI:10.1561/2200000037
ISBN: 978-1-60198-808-9
194 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-60198-809-6
194 pp. $240.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Basic Definitions and Notation
3. A Brief Review of Passive Learning
4. Lower Bounds on the Label Complexity
5. Disagreement-Based Active Learning
6. Computational Efficiency via Surrogate Losses
7. Bounding the Disagreement Coefficient
8. A Survey of Other Topics and Techniques
References

Theory of Disagreement-Based Active Learning

Active learning is a protocol for supervised machine learning in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning where the labeled data are taken at random. The objective in active learning is to produce a highly-accurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same.

Theory of Disagreement-Based Active Learning describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the monograph focuses on a particular technique, namely disagreement-based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing on results that illustrate fundamental ideas rather than obtaining the strongest or most generally known theorems.

Theory of Disagreement-Based Active Learning is intended for researchers and advanced graduate students in machine learning and statistics who are interested in gaining a deeper understanding of the recent and ongoing developments in the theory of active learning.

 
MAL-037