A game theoretic perspective on Reinforcement and Imitation Learning

Event details

Date	12.07.2022
Hour	09:00 › 11:00
Speaker	Luca Viano
Location	ELD 120
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Nicolas Boumal
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Maryam Kamgarpour

Abstract
The Proximal Point Method (PPM) enjoys favorable convergence properties due to Rockafellar, 1976 and Gueler, 1991 but it can be rarely implemented in practice as the implementation of a single update can be as hard as the original problem. It is known however that proximal point can be implemented for linear losses where proximal point coincides with gradient descent in the euclidean case or to mirror descent in the Bregman setup. This fact has been leveraged in the reinforcement learning community to develop algorithms like Relative Entropy Policy Search (REPS) (Peters et al.,2010 and Pacchiano et al.,2021) for the online RL setting, O-REPS for the adversarial MDP setting and PRO-RL for the offline setting.
In this document we notice that proximal point can be still implemented for the particular case of functions defined in max form that is of interest for the imitation learning. Under this setting proximal point can still be implement (approximately) while being different from mirror descent. We also present IQ-Learn (Garg et al., 2021) a recently proposed, efficient algorithm for imitation learning under a proximal point perspective.

Background papers
On the convergence of the proximal point algorithm for convex minimisation, Osman Gueler, 1991
https://www.researchgate.net/publication/267054708_On_the_Convergence_of_the_Proximal_Point_Algorithm_for_Convex_Minimization

Near Optimal Policy Optimisation via REPS, Aldo Pacchiano et al., 2021
https://proceedings.neurips.cc/paper/2021/file/08d562c1eedd30b15b51e35d8486d14c-Paper.pdf

IQ-learn: Inverse soft-Q learning for imitation, Divyansh Garg et al., 2021
https://proceedings.neurips.cc/paper/2021/file/210f760a89db30aa72ca258a3483cc7f-Paper.pdf

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in

Send a reminder