A Survey on Policy Search for Robotics

Marc Peter Deisenroth; Gerhard Neumann; Jan Peters

doi:10.1561/2300000021

Foundations and Trends® in Robotics > Vol 2 > Issue 1–2

A Survey on Policy Search for Robotics

By Marc Peter Deisenroth, Technische Universität Darmstadt, Germany and Imperial College London, UK, marc@ias.tu-darmstadt.de | Gerhard Neumann, Technische Universität Darmstadt, Germany, neumann@ias.tu-darmstadt.de | Jan Peters, Technische Universität Darmstadt, Germany and Max Planck Institute for Intelligent Systems, Germany, peters@ias.tu-darmstadt.de

Suggested Citation

Marc Peter Deisenroth, Gerhard Neumann and Jan Peters (2013), "A Survey on Policy Search for Robotics", Foundations and Trends® in Robotics: Vol. 2: No. 1–2, pp 1-142. http://dx.doi.org/10.1561/2300000021

Publication Date: 30 Aug 2013

Subjects

Artificial Intelligence in Robotics, Planning and Control, Markov Decision Processes

Journal details

Download article

In this article:

Abstract

Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning.

Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model-free and model-based policy search methods, we review their respective properties and their applicability to robotic systems.

DOI:10.1561/2300000021

Book details

ISBN: 978-1-60198-702-0

146 pp. $99.00

Buy book (pb)

ISBN: 978-1-60198-703-7

146 pp. $230.00

Buy E-book (.pdf)

Table of contents:

1. Introduction

2. Model-free Policy Search

3. Model-based Policy Search

4. Conclusion and Discussion

Acknowledgments

A. Gradients of Frequently Used Policies

B. Weighted ML Estimates of Frequently Used Policies

C. Derivations of the Dual Functions for REPS

References

A Survey on Policy Search for Robotics

Policy search is a subfield of Reinforcement Learning (RL) that focuses on finding good parameters for a given policy parameterization. It is well suited tor robotics as it can cope with high-dimensional state and action spaces, which is one of the main challenges in robot learning.

A Survey on Policy Search for Robotics reviews recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. This text classifies model-free methods based on their policy evaluation, policy update, and exploration strategies, and presents a unified view of existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning.

For both model-free and model-based policy search methods, A Survey on Policy Search for Robotics reviews their respective properties and their applicability to robotic systems. It is an invaluable reference for anyone working in the area.

Supplementary information

Erratum for 3.3.2.2 Analytic policy gradients (available for journal subscribers)

Erratum | 2300000021_Erratum.pdf

Submitted By: Marc Deisenroth, Technische Universität Darmstadt, marc@ias.tu-darmstadt.de. Date Accepted: 9/23/2013

Description: The authors have corrected Equation (3.21) and the sentence that follows on p. 103 in this issue. If you have access to this journal article or e-book, please see the 'Erratum' link above for the correction.

A Survey on Policy Search for Robotics

Free Preview:

Share

Journal details

Abstract

Book details

A Survey on Policy Search for Robotics

Supplementary information