Instant-Optimal Algorithms for Pure Exploration in Reinforcement Learning

Thumbnail

Event details

Date 01.12.2025 02.12.2025
Hour 11:1512:00
Speaker Cyrille Kone, PhD, University of Lille
Location
Category Conferences - Seminars
Event Language English

Abstract
Instant-Optimal Algorithms for Pure Exploration in Reinforcement Learning
In online reinforcement learning, pure exploration aims to identify an optimal policy after a learning phase with minimal sample complexity, in contrast to regret minimization which focuses on performance during learning. We study instance-dependent lower bounds for this problem, which take the form of a two-player zero-sum game between an explorer choosing a behavior policy and nature selecting an alternative MDP. We propose a computationally efficient algorithm based on posterior sampling that matches this lower bound in the small-error regime, bypassing the hardness of computing best responses. We further discuss extensions to multi-agent reinforcement learning, where the goal is to identify strategic equilibria such as Nash equilibria in unknown environments.

Biography
Cyrille Kone is a PhD candidate in Computer Science at the University of Lille and Inria within the Scool team, supervised by Prof. Emilie Kaufmann and Prof. Laura Richert. His research focuses on the theoretical foundations of sequential decision-making, with emphasis on pure exploration in bandits and reinforcement learning, instance-optimal algorithm design, and multi-objective optimization. His work has been published at top machine learning venues including NeurIPS, ICML, and AISTATS. He will defend his PhD in December 2025.