Foundations and Trends® in Machine Learning > Vol 16 > Issue 1

Model-based Reinforcement Learning: A Survey

By Thomas M. Moerland, LIACS, Leiden University, The Netherlands, t.m.moerland@liacs.leidenuniv.nl | Joost Broekens, LIACS, Leiden University, The Netherlands | Aske Plaat, LIACS, Leiden University, The Netherlands | Catholijn M. Jonker, Interactive Intelligence, TU Delft, and LIACS, Leiden University, The Netherlands

 
Suggested Citation
Thomas M. Moerland, Joost Broekens, Aske Plaat and Catholijn M. Jonker (2023), "Model-based Reinforcement Learning: A Survey", Foundations and TrendsĀ® in Machine Learning: Vol. 16: No. 1, pp 1-118. http://dx.doi.org/10.1561/2200000086

Publication Date: 04 Jan 2023
© 2023 T. M. Moerland et al.
 
Subjects
Reinforcement Learning,  Deep Learning,  Planning and Control
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Background
3. Categories of Model-based Reinforcement Learning
4. Dynamics Model Learning
5. Integration of Planning and Learning
6. Implicit Model-based Reinforcement Learning
7. Benefits of Model-based Reinforcement Learning
8. Theory of Model-based Reinforcement Learning
9. Related Work
10. Discussion
11. Summary
References

Abstract

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This survey is an integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.

DOI:10.1561/2200000086
ISBN: 978-1-63828-056-9
130 pp. $90.00
Buy book (pb)
 
ISBN: 978-1-63828-057-6
130 pp. $145.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Background
3. Categories of Model-based Reinforcement Learning
4. Dynamics Model Learning
5. Integration of Planning and Learning
6. Implicit Model-based Reinforcement Learning
7. Benefits of Model-based Reinforcement Learning
8. Theory of Model-based Reinforcement Learning
9. Related Work
10. Discussion
11. Summary
References

Model-based Reinforcement Learning: A Survey

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This monograph surveys an integration of both fields, better known as model-based reinforcement learning.

Model-based RL has two main steps: dynamics model learning and planning-learning integration. In this comprehensive survey of the topic, the authors first cover dynamics model learning, including challenges such as dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. They then present a systematic categorization of planning-learning integration, including aspects such as: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop.

In conclusion the authors discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and cover the potential benefits of model-based RL. Along the way, the authors draw connections to several related RL fields, including hierarchical RL and transfer learning.

This monograph contains a broad conceptual overview of the combination of planning and learning for Markov Decision Process optimization. It provides a clear and complete introduction to the topic for students and researchers alike.

 
MAL-086