Variance reduction in reinforcement learning optimization

Thumbnail

Event details

Date 25.08.2022
Hour 09:0011:00
Speaker Mohammadsadegh Khorasani
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Negar Kiyavash
Thesis co-advisor: Prof. Matthias Grossglauser
Co-examiner: Prof. Patrick Thiran

Abstract
The variance-reduced gradient estimators for policy gradient methods has been one of the main focus of research in reinforcement learning in recent years as they allow acceleration of the estimation process. In this report, I review some recent work attempting to adapt variance reduction techniques into the context of RL and achieve $\epsilon$-first-order stationary point in $O(\epsilon^{-3})$ number of trajectories. Most previous work requires huge batch size or importance sampling techniques that can compromise the advantage of the variance reduction process. Moreover, I present some ideas that can be adapted to recent variance reduction methods from a method in the context of optimization into the context of RL while using a batch size of $O(1)$ and without importance sampling. Furthermore, I discuss experimental results comparing previous methods.
Finally, I will propose some possible future directions to devise variance reduction methods well-matched to the RL setting. Furthermore, I will discuss some practical issues that should be taken into account in the implementation and evaluation of optimization algorithms.

Background papers
1- "Hessian Aided Policy Gradient" (Shen 2019 ). As the main reference paper in this area.
2- "Better SGD using Second-order Momentum"(Tran21). The paper that we applied for RL.
3- "Momentum-Based Policy Gradient Methods" (link). The most recent work on variance reduction in RL is based on STORM.
 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share