Talk of Professor Nicolò Cesa-Bianchi (University of Milan)

Event details
Date | 09.10.2019 |
Hour | 11:15 › 13:15 |
Speaker | Professor Nicolò Cesa-Bianchi |
Location | |
Category | Conferences - Seminars |
TITLE: Nonstochastic Multiarmed Bandits with Unrestricted Delays
ABSTRACT:
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that the "delayed" Exp3 achieves the [O(\sqrt{(KT + D)\ln K})] regret bound conjectured in the case of variable, but bounded delays. Here, [K] is the number of actions and [D] is the total delay over [T] rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of [D] and [T] . For this algorithm we then construct a novel doubling scheme that forgoes this requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order [\min_\beta (|S_\beta|+\beta \ln K + (KT + D_\beta)\/\beta)] , where [|S_\beta|] is the number of observations with delay exceeding [\beta] , and [D_\beta] is the total delay of observations with delay below [\beta] . The bound relaxes to [O(\sqrt{(KT + D)\ln K})] , but we also provide examples where [D_\beta \ll D] and the oracle bound has a polynomially better dependence on the problem parameters.. Joint work with Tobias Thune and Yevgeny Seldin
BIO:
Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milan, Italy. His main research areas are: design and analysis of machine learning algorithms; algorithms for multiarmed bandit problems with applications to personalized recommendations and online auctions; graph analytics with applications to social networks and bioinformatics. On these topics he published over 130 papers. He is co-author of the monographs "Prediction, Learning, and Games" and "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". He served as President of the Association for Computational Learning, and is the recipient of a Google Research Award, a Xerox Foundation UAC Award, a Criteo Faculty Award, and a Google Focused Award.
ABSTRACT:
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that the "delayed" Exp3 achieves the [O(\sqrt{(KT + D)\ln K})] regret bound conjectured in the case of variable, but bounded delays. Here, [K] is the number of actions and [D] is the total delay over [T] rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of [D] and [T] . For this algorithm we then construct a novel doubling scheme that forgoes this requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order [\min_\beta (|S_\beta|+\beta \ln K + (KT + D_\beta)\/\beta)] , where [|S_\beta|] is the number of observations with delay exceeding [\beta] , and [D_\beta] is the total delay of observations with delay below [\beta] . The bound relaxes to [O(\sqrt{(KT + D)\ln K})] , but we also provide examples where [D_\beta \ll D] and the oracle bound has a polynomially better dependence on the problem parameters.. Joint work with Tobias Thune and Yevgeny Seldin
BIO:
Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milan, Italy. His main research areas are: design and analysis of machine learning algorithms; algorithms for multiarmed bandit problems with applications to personalized recommendations and online auctions; graph analytics with applications to social networks and bioinformatics. On these topics he published over 130 papers. He is co-author of the monographs "Prediction, Learning, and Games" and "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". He served as President of the Association for Computational Learning, and is the recipient of a Google Research Award, a Xerox Foundation UAC Award, a Criteo Faculty Award, and a Google Focused Award.
Practical information
- Expert
- Registration required
Organizer
- Porfessor Volkan Cevher
Contact
- Gosia Baltaian