Talk of Professor Nicolò Cesa-Bianchi (University of Milan)

Thumbnail

Event details

Date 09.10.2019
Hour 11:1513:15
Speaker Professor Nicolò Cesa-Bianchi
Location
Category Conferences - Seminars
TITLE: Nonstochastic Multiarmed Bandits with Unrestricted Delays

ABSTRACT:
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that the "delayed" Exp3 achieves the [O(\sqrt{(KT + D)\ln K})]  regret bound conjectured in the case of variable, but bounded delays. Here, [K]  is the number of actions and [D]  is the total delay over [T]  rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of [D]  and [T] . For this algorithm we then construct a novel doubling scheme that forgoes this requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order [\min_\beta (|S_\beta|+\beta \ln K + (KT + D_\beta)\/\beta)] , where [|S_\beta|]  is the number of observations with delay exceeding [\beta] , and [D_\beta] is the total delay of observations with delay below [\beta] . The bound relaxes to [O(\sqrt{(KT + D)\ln K})] , but we also provide examples where [D_\beta \ll D]  and the oracle bound has a polynomially better dependence on the problem parameters..   Joint work with Tobias Thune and Yevgeny Seldin  

BIO:
Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milan, Italy. His main research areas are: design and analysis of machine learning algorithms; algorithms for multiarmed bandit problems with applications to personalized recommendations and online auctions; graph analytics with applications to social networks and bioinformatics. On these topics he published over 130 papers. He is co-author of the monographs "Prediction, Learning, and Games" and "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". He served as President of the Association for Computational Learning, and is the recipient of a Google Research Award, a Xerox Foundation UAC Award, a Criteo Faculty Award, and a Google Focused Award.

Practical information

  • Expert
  • Registration required

Organizer

  • Porfessor Volkan Cevher

Contact

  • Gosia Baltaian

Event broadcasted in

Share