Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Event details

Date	23.05.2025
Hour	13:30 › 14:30
Speaker	Yifan HU (EPFL)
Location	MA C1 517
Category	Conferences - Seminars
Event Language	English

Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization imposes significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov Decision Processes (MDPs) with general state and action spaces, we develop a framework that provides a set of easily verifiable assumptions to ensure the Kurdyka-Łojasiewicz (KŁ) condition of the policy optimization. Leveraging the KŁ condition, policy gradient methods converge to the globally optimal policy with a non-asymptomatic rate despite nonconvexity. Our results find applications in various control and operations models, including entropy-regularized tabular MDPs, Linear Quadratic Regulator (LQR) problems, stochastic inventory models, and stochastic cash balance problems, for which we show an ϵ-optimal policy can be obtained using a sample size in O(ϵ-1) and polynomial in terms of the planning horizon by stochastic policy gradient methods. Our result establishes the first sample complexity for multi-period inventory systems with Markov-modulated demands and stochastic cash balance problems in the literature.

Practical information

General public
Free

Organizer

Prof. Nicolas Boumal

Contact

Nicolas Boumal Séverine Eggli

Export Event

Event broadcasted in

Send a reminder