Foundations and Trends® in Machine Learning > Vol 15 > Issue 1-2

Dynamical Variational Autoencoders: A Comprehensive Review

By Laurent Girin, Université Grenoble Alpes, France, laurent.girin@grenoble-inp.fr | Simon Leglaive, CentraleSupélec, France, simon.leglaive@centralesupelec.fr | Xiaoyu Bie, The National Institute for Research in Digital Science and Technology (Inria), France, xiaoyu.bie@inria.fr | Julien Diard, Université Grenoble Alpes, France, julien.diard@univ-grenoble-alpes.fr | Thomas Hueber, Université Grenoble Alpes, France, thomas.hueber@grenoble-inp.fr | Xavier Alameda-Pineda, Université Grenoble Alpes, France, xavier.alameda-pineda@inria.fr

 
Suggested Citation
Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber and Xavier Alameda-Pineda (2021), "Dynamical Variational Autoencoders: A Comprehensive Review", Foundations and Trends® in Machine Learning: Vol. 15: No. 1-2, pp 1-175. http://dx.doi.org/10.1561/2200000089

Publication Date: 02 Dec 2021
© 2021 L. Girin et al.
 
Subjects
Deep Learning,  Variational inference,  Dimensionality reduction,  Graphical models,  Dynamics,  Learning and statistical methods,  Nonlinear signal processing,  Speech/audio/image/video compression,  Latent variable models,  Time series analysis
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Variational Autoencoders
3. Recurrent Neural Networks and State Space Models
4. Definition of Dynamical VAEs
5. Deep Kalman Filters
6. Kalman Variational Autoencoders
7. STOchastic Recurrent Networks
8. Variational Recurrent Neural Networks
9. Stochastic Recurrent Neural Networks
10. Recurrent Variational Autoencoders
11. Disentangled Sequential Autoencoders
12. Brief tour of other models
13. Experiments
14. Discussion
Acknowledgements
Appendices
References

Abstract

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models. In this monograph, we perform a literature review of these models. We introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which encompasses a large subset of these temporal VAE extensions. Then, we present in detail seven recently proposed DVAE models, with an aim to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We have reimplemented those seven DVAE models and present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The monograph concludes with a discussion on important issues concerning the DVAE class of models and future research guidelines.

DOI:10.1561/2200000089
ISBN: 978-1-68083-912-8
197 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-68083-913-5
197 pp. $280.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Variational Autoencoders
3. Recurrent Neural Networks and State Space Models
4. Definition of Dynamical VAEs
5. Deep Kalman Filters
6. Kalman Variational Autoencoders
7. STOchastic Recurrent Networks
8. Variational Recurrent Neural Networks
9. Stochastic Recurrent Neural Networks
10. Recurrent Variational Autoencoders
11. Disentangled Sequential Autoencoders
12. Brief tour of other models
13. Experiments
14. Discussion
Acknowledgements
Appendices
References

Dynamical Variational Autoencoders: A Comprehensive Review

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In this monograph the authors introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which extend VAEs to model temporal vector sequences. In doing so the authors provide:

• a formal definition of the general class of DVAEs

• a detailed and complete technical description of seven DVAE models

• a rapid overview of other DVAE models presented in the recent literature

• discussion of the recent developments in DVAEs in relation to the history and technical background of the classical models DVAEs are built on

• a quantitative benchmark of the selected DVAE models

• a discussion to put the DVAE class of models into perspective

This monograph is a comprehensive review of the current state-of-the-art in DVAEs. It gives the reader an accessible summary of the technical aspects of the different DVAE models, their connections with classical models, their cross-connections, and their unification in the DVAE class in a concise, easy-to-read book.

The authors have put considerable effort into unifying the terminology and notation used across the various models which all students, researchers and practitioners working in machine learning will find an invaluable resource.

 
MAL-089