By Marc Peter Deisenroth, Technische Universität Darmstadt, Germany and Imperial College London, UK, marc@ias.tu-darmstadt.de | Gerhard Neumann, Technische Universität Darmstadt, Germany, neumann@ias.tu-darmstadt.de | Jan Peters, Technische Universität Darmstadt, Germany and Max Planck Institute for Intelligent Systems, Germany, peters@ias.tu-darmstadt.de
Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning.
Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model-free and model-based policy search methods, we review their respective properties and their applicability to robotic systems.
Policy search is a subfield of Reinforcement Learning (RL) that focuses on finding good parameters for a given policy parameterization. It is well suited tor robotics as it can cope with high-dimensional state and action spaces, which is one of the main challenges in robot learning.
A Survey on Policy Search for Robotics reviews recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. This text classifies model-free methods based on their policy evaluation, policy update, and exploration strategies, and presents a unified view of existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning.
For both model-free and model-based policy search methods, A Survey on Policy Search for Robotics reviews their respective properties and their applicability to robotic systems. It is an invaluable reference for anyone working in the area.
Erratum for 3.3.2.2 Analytic policy gradients (available for journal subscribers)
Erratum | 2300000021_Erratum.pdf
Submitted By: Marc Deisenroth, Technische Universität Darmstadt, marc@ias.tu-darmstadt.de. Date Accepted: 9/23/2013