Foundations and Trends® in Signal Processing > Vol 18 > Issue 1-2

Energy-Based Models with Applications to Speech and Language Processing

By Zhijian Ou, Tsinghua University, China, ozj@tsinghua.edu.cn

 
Suggested Citation
Zhijian Ou (2024), "Energy-Based Models with Applications to Speech and Language Processing", Foundations and TrendsĀ® in Signal Processing: Vol. 18: No. 1-2, pp 1-199. http://dx.doi.org/10.1561/2000000117

Publication Date: 14 Mar 2024
© 2024 Z. Ou
 
Subjects
Statistical signal processing,  Statistical/machine learning,  Speech and spoken language processing,  Stochastic optimization,  Deep learning,  Graphical models,  Markov chain Monte Carlo,  Variational inference,  Classification and prediction,  Probability and statistics
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Basics for EBMs
3. EBMs for Sequential Data With Applications in Language Modeling
4. Conditional EBMs With Applications
5. Joint EBMs With Applications
6. Conclusions
Acknowledgements
Appendices
References

Abstract

Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are un-normalized and thus radically different from other popular self-normalized probabilistic models such as hidden Markov models (HMMs), autoregressive models, generative adversarial nets (GANs) and variational auto-encoders (VAEs). During these years, EBMs have attracted increasing interest not only from core machine learning but also from application domains such as speech, vision, natural language processing (NLP) and so on, with significant theoretical and algorithmic progress. To the best of our knowledge, there are no review papers about EBMs with applications to speech and language processing. The sequential nature of speech and language also presents special challenges and needs treatment different from processing fix-dimensional data (e.g., images).

The purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing, which is organized into four main sections. First, we will introduce basics for EBMs, including classic models, recent models parameterized by neural networks, sampling methods, and various learning methods from the classic learning algorithms to the most advanced ones. The next three sections will present how to apply EBMs in three different scenarios, i.e., for modeling marginal, conditional and joint distributions, respectively. 1) EBMs for sequential data with applications in language modeling, where we are mainly concerned with the marginal distribution of a sequence itself; 2) EBMs for modeling conditional distributions of target sequences given observation sequences, with applications in speech recognition, sequence labeling and text generation; 3) EBMs for modeling joint distributions of both sequences of observations and targets, and their applications in semi-supervised learning and calibrated natural language understanding. In addition, we will introduce some open-source toolkits to help the readers to get familiar with the techniques for developing and applying energy-based models.

DOI:10.1561/2000000117
ISBN: 978-1-63828-306-5
222 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-307-2
222 pp. $310.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Basics for EBMs
3. EBMs for Sequential Data With Applications in Language Modeling
4. Conditional EBMs With Applications
5. Joint EBMs With Applications
6. Conclusions
Acknowledgements
Appendices
References

Energy-Based Models with Applications to Speech and Language Processing

Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are un-normalized and thus radically different from other popular self-normalized probabilistic models such as hidden Markov models (HMMs), autoregressive models, generative adversarial nets (GANs) and variational auto-encoders (VAEs).

Over the past years, EBMs have attracted increasing interest not only from the core machine learning community, but also from application domains such as speech, vision, natural language processing (NLP) and so on, due to significant theoretical and algorithmic progress. The sequential nature of speech and language also presents special challenges and needs a different treatment from processing fix-dimensional data (e.g., images). Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing. First, the basics of EBMs are introduced, including classic models, recent models parameterized by neural networks, sampling methods, and various learning methods from the classic learning algorithms to the most advanced ones. Then, the application of EBMs in three different scenarios is presented, i.e., for modeling marginal, conditional and joint distributions, respectively. 1) EBMs for sequential data with applications in language modeling, where the main focus is on the marginal distribution of a sequence itself; 2) EBMs for modeling conditional distributions of target sequences given observation sequences, with applications in speech recognition, sequence labeling and text generation; 3) EBMs for modeling joint distributions of both sequences of observations and targets, and their applications in semi-supervised learning and calibrated natural language understanding. Lastly, several open-source toolkits are introduced to help readers get familiar with the techniques for developing and applying energy-based models.

 
SIG-117