A game theoretic perspective on Reinforcement and Imitation Learning


Event details

Date 12.07.2022 09:0011:00  
Speaker Luca Viano
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Nicolas Boumal
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Maryam Kamgarpour

The Proximal Point Method (PPM) enjoys favorable convergence properties due to Rockafellar, 1976 and Gueler, 1991 but it can be rarely implemented in practice as the implementation of a single update can be as hard as the original problem. It is known however that proximal point can be implemented for linear losses where proximal point coincides with gradient descent in the euclidean case or to mirror descent in the Bregman setup. This fact has been leveraged in the reinforcement learning community to develop algorithms like Relative Entropy Policy Search (REPS) (Peters et al.,2010 and Pacchiano et al.,2021) for the online RL setting, O-REPS for the adversarial MDP setting and PRO-RL for the offline setting.
In this document we notice that proximal point can be still implemented for the particular case of functions defined in max form that is of interest for the imitation learning. Under this setting proximal point can still be implement (approximately) while being different from mirror descent. We also present IQ-Learn (Garg et al., 2021) a recently proposed, efficient algorithm for imitation learning under a proximal point perspective.

Background papers
On the convergence of the proximal point algorithm for convex minimisation, Osman Gueler, 1991

Near Optimal Policy Optimisation via REPS, Aldo Pacchiano et al., 2021

IQ-learn: Inverse soft-Q learning for imitation, Divyansh Garg et al., 2021


Practical information

  • General public
  • Free


EDIC candidacy exam