Seminar by Prof. Huan Xu, Georgia Tech
Event details
Date | 17.12.2019 |
Hour | 14:30 › 16:00 |
Speaker | Prof. Huan Xu, Georgia Tech |
Location | |
Category | Conferences - Seminars |
"The online saddle point problem and its applications"
Abstract:
We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called “SP-regret”. This problem can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of two-player zero-sum game. We propose an algorithm that achieves O(\sqrt{T}) SP-regret in the general case, and O(log(T)) SP-regret for the strongly convex-concave case. Based on these generic results, we then consider two specific applications, namely, online convex optimization with knapsacks problem, and zero-sum matrix game.
Bio:
Huan Xu is a senior staff engineer at Damo Academy of Alibaba Inc. He received his Ph. D. from McGill University in 2009. After a postdoc stint at UT-Austin, he was an assistant professor and then an associate professor (with tenure) at the National University of Singapore. He was an assistant professor at Georgia Institute of Technology, before joining Alibaba. He is interested in machine learning and high-dimensional statistics, robust optimization, and reinforcement learning. He has published over 90 papers in premier venues in operations research and machine learning. He is currently an associate editor of Mathematics of Operations Research, IEEE PAMI, and is an area chair of ICML and NIPS.
Abstract:
We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called “SP-regret”. This problem can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of two-player zero-sum game. We propose an algorithm that achieves O(\sqrt{T}) SP-regret in the general case, and O(log(T)) SP-regret for the strongly convex-concave case. Based on these generic results, we then consider two specific applications, namely, online convex optimization with knapsacks problem, and zero-sum matrix game.
Bio:
Huan Xu is a senior staff engineer at Damo Academy of Alibaba Inc. He received his Ph. D. from McGill University in 2009. After a postdoc stint at UT-Austin, he was an assistant professor and then an associate professor (with tenure) at the National University of Singapore. He was an assistant professor at Georgia Institute of Technology, before joining Alibaba. He is interested in machine learning and high-dimensional statistics, robust optimization, and reinforcement learning. He has published over 90 papers in premier venues in operations research and machine learning. He is currently an associate editor of Mathematics of Operations Research, IEEE PAMI, and is an area chair of ICML and NIPS.
Practical information
- General public
- Free