By Xiuyuan Lu, DeepMind, USA, lxlu@deepmind.com | Benjamin Van Roy, DeepMind, USA, benvanroy@deepmind.com | Vikranth Dwaracherla, DeepMind, USA, vikranthd@deepmind.com | Morteza Ibrahimi, DeepMind, USA, mibrahimi@deepmind.com | Ian Osband, DeepMind, USA, iosband@deepmind.com | Zheng Wen, DeepMind, USA, zhengwen@deepmind.com
Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We discuss concepts and regret analysis that together offer principled guidance. This line of thinking sheds light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.
Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency, however, significantly impedes carrying this success over to real environments. The design of data-efficient agents that address this problem calls for a deeper understanding of information acquisition and representation. This tutorial offers a framework that can guide associated agent design decisions. This framework is inspired in part by concepts from information theory that has grappled with data efficiency for many years in the design of communication systems.
In this tutorial, the authors shed light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate the concepts, they design simple agents that build on them and present computational results that highlight data efficiency.
This book will be of interest to students and researchers working in reinforcement learning and information theorists wishing to apply their knowledge in a practical way to reinforcement learning problems.