Policy gradient methods for Reinforcement Learning

Thumbnail

Event details

Date 27.08.2018
Hour 14:0016:00
Speaker Paul Rolland
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Olivier Lévêque
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Wulfram Gerstner

Abstract
We consider the problem of policy optimization in Reinforcement Learning via policy gradient method. We first show that it is possible to design a theoretical update rule for the policy parameters that converges to an optimal policy. We then present a state-of-the-art algorithm, called ``Trust Region Policy Optimization", that aims to approximate this update rule. Finally, we present a smart way of parametrizing the policy, which links both standard tabular methods and policy gradient methods.

Background papers
Policy Gradient Methods for Reinforcement Learning with Function Approximation, by Sutton, R., et al.
Trust Region Policy Optimization, by Schulman, J., et al.
Value Iteration Networks, by Tamar, A., et al.

Practical information

  • General public
  • Free

Contact

Tags

EDIC candidacy exam

Share