Foundations and Trends® in Computer Graphics and Vision > Vol 16 > Issue 4

Tutorial on Diffusion Models for Imaging and Vision

By Stanley Chan, School of Electrical and Computer Engineering, Purdue University, USA, stanchan@purdue.edu

 
Suggested Citation
Stanley Chan (2024), "Tutorial on Diffusion Models for Imaging and Vision", Foundations and TrendsĀ® in Computer Graphics and Vision: Vol. 16: No. 4, pp 322-471. http://dx.doi.org/10.1561/0600000112

Publication Date: 18 Dec 2024
© 2024 S. Chan
 
Subjects
Image and video retrieval,  Tracking,  Video analysis and event recognition
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Variational Auto-Encoder (VAE)
2. Denoising Diffusion Probabilistic Model (DDPM)
3. Score-Matching Langevin Dynamics (SMLD)
4. Stochastic Differential Equation (SDE)
5. Langevin and Fokker-Planck Equations
6. Conclusion
Acknowledgements
References

Abstract

The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of diffusion, a particular sampling mechanism that has overcome some longstanding shortcomings in previous approaches. The goal of this tutorial is to discuss the essential ideas underlying these diffusion models. The target audience of this tutorial includes undergraduate and graduate students who are interested in doing research on diffusion models or applying these tools to solve other problems.

DOI:10.1561/0600000112
ISBN: 978-1-63828-432-1
162 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-433-8
162 pp. $155.00
Buy E-book (.pdf)
Table of contents:
1. Variational Auto-Encoder (VAE)
2. Denoising Diffusion Probabilistic Model (DDPM)
3. Score-Matching Langevin Dynamics (SMLD)
4. Stochastic Differential Equation (SDE)
5. Langevin and Fokker-Planck Equations
6. Conclusion
Acknowledgements
References

Tutorial on Diffusion Models for Imaging and Vision

The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of diffusion, a particular sampling mechanism that has overcome some shortcomings that were deemed difficult in the previous approaches. The goal of this monograph is to discuss the essential ideas underlying the diffusion models. The target audience includes undergraduate and graduate students who are interested in doing research on diffusion models or applying these models to solve other problems.

 
CGV-112