Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Event details
Date | 23.05.2025 |
Hour | 13:30 › 14:30 |
Speaker | Yifan HU (EPFL) |
Location |
MA C1 517
|
Category | Conferences - Seminars |
Event Language | English |
Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization imposes significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov Decision Processes (MDPs) with general state and action spaces, we develop a framework that provides a set of easily verifiable assumptions to ensure the Kurdyka-Łojasiewicz (KŁ) condition of the policy optimization. Leveraging the KŁ condition, policy gradient methods converge to the globally optimal policy with a non-asymptomatic rate despite nonconvexity. Our results find applications in various control and operations models, including entropy-regularized tabular MDPs, Linear Quadratic Regulator (LQR) problems, stochastic inventory models, and stochastic cash balance problems, for which we show an ϵ-optimal policy can be obtained using a sample size in O(ϵ-1) and polynomial in terms of the planning horizon by stochastic policy gradient methods. Our result establishes the first sample complexity for multi-period inventory systems with Markov-modulated demands and stochastic cash balance problems in the literature.
Practical information
- General public
- Free
Organizer
- Prof. Nicolas Boumal
Contact
- Nicolas Boumal Séverine Eggli