Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Vaneet Aggarwal; Washim Uddin Mondal; Qinbo Bai

doi:10.1561/2400000038

Foundations and Trends® in Optimization > Vol 6 > Issue 4

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

By Vaneet Aggarwal, Purdue University, USA, vaneet@purdue.edu | Washim Uddin Mondal, Indian Institute of Technology Kanpur, India, wmondal@iitk.ac.in | Qinbo Bai, Purdue University, USA, bai113@purdue.edu

Suggested Citation

Vaneet Aggarwal, Washim Uddin Mondal and Qinbo Bai (2024), "Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms", Foundations and Trends® in Optimization: Vol. 6: No. 4, pp 193-298. http://dx.doi.org/10.1561/2400000038

Publication Date: 21 Aug 2024

Subjects

Statistical learning theory, Reinforcement learning, Online learning, Stochastic optimization

Journal details

Download article

In this article:

Abstract

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective of these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process.

This monograph focuses on the exploration of various modelbased and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods – optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal dual policy gradient-based algorithm is explored as a solution for constrained MDPs. The monograph provides regret guarantees and analyzes constraint violation for each of the discussed setups.

For the above exploration, we assume the underlying MDP to be ergodic. Further, this monograph extends its discussion to encompass results tailored for weakly communicating MDPs, thereby broadening the scope of its findings and their relevance to a wider range of practical scenarios.

DOI:10.1561/2400000038

Book details

ISBN: 978-1-63828-396-6

116 pp. $80.00

Buy book (pb)

ISBN: 978-1-63828-397-3

116 pp. $155.00

Buy E-book (.pdf)

Table of contents:

1. Introduction

2. Model-Based RL

3. Parameterized Model-Free RL

4. Beyond Ergodic MDPs

Acknowledgements

References

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective of these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process.

This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods – optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal dual policy gradient-based algorithm is explored as a solution for constrained MDPs.

The monograph provides regret guarantees and analyzes constraint violation for each of the discussed setups. For the above exploration, the authors assume the underlying MDP to be ergodic. Further, this monograph extends its discussion to encompass results tailored for weakly communicating MDPs, thereby broadening the scope of its findings and their relevance to a wider range of practical scenarios.

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Free Preview:

Share

Journal details

Abstract

Book details

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms