BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Landscape of Policy Optimization for Finite Horizon MDPs with Gene
 ral State and Action
DTSTART:20250523T133000
DTEND:20250523T143000
DTSTAMP:20260416T015513Z
UID:e876e5b6a4b57f1d53286ecce5517f56163c66da0a6a51eecdb48673
CATEGORIES:Conferences - Seminars
DESCRIPTION:Yifan HU (EPFL)\nPolicy gradient methods are widely used in re
 inforcement learning. Yet\, the nonconvexity of policy optimization impose
 s significant challenges in understanding the global convergence of policy
  gradient methods. For a class of finite-horizon Markov Decision Processes
  (MDPs) with general state and action spaces\, we develop a framework that
  provides a set of easily verifiable assumptions to ensure the Kurdyka-Ło
 jasiewicz (KŁ) condition of the policy optimization. Leveraging the KŁ c
 ondition\, policy gradient methods converge to the globally optimal policy
  with a non-asymptomatic rate despite nonconvexity. Our results find appli
 cations in various control and operations models\, including entropy-regul
 arized tabular MDPs\, Linear Quadratic Regulator (LQR) problems\, stochast
 ic inventory models\, and stochastic cash balance problems\, for which we 
 show an ϵ-optimal policy can be obtained using a sample size in O(ϵ-1) 
 and polynomial in terms of the planning horizon by stochastic policy gradi
 ent methods. Our result establishes the first sample complexity for multi-
 period inventory systems with Markov-modulated demands and stochastic cash
  balance problems in the literature.\n 
LOCATION:MA C1 517
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
