Policy gradient methods for Reinforcement Learning

Event details

Date	27.08.2018
Hour	14:00 › 16:00
Speaker	Paul Rolland
Location	ELD 120
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Olivier Lévêque
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Wulfram Gerstner

Abstract
We consider the problem of policy optimization in Reinforcement Learning via policy gradient method. We first show that it is possible to design a theoretical update rule for the policy parameters that converges to an optimal policy. We then present a state-of-the-art algorithm, called ``Trust Region Policy Optimization", that aims to approximate this update rule. Finally, we present a smart way of parametrizing the policy, which links both standard tabular methods and policy gradient methods.

Background papers
Policy Gradient Methods for Reinforcement Learning with Function Approximation, by Sutton, R., et al.
Trust Region Policy Optimization, by Schulman, J., et al.
Value Iteration Networks, by Tamar, A., et al.

Practical information

General public
Free

Contact

EDIC - [email protected]

Export Event

Event broadcasted in

Send a reminder