BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:IC Colloquium: Bridging Science and AI: Towards Building AI Algori
 thms for Real-World
DTSTART:20220413T100000
DTEND:20220413T110000
DTSTAMP:20260511T165351Z
UID:f5e31a1ccb0eefb2e727c6ad7895532e6d5c54e79f7a87b915670b46
CATEGORIES:Conferences - Seminars
DESCRIPTION:By: Caglar Gulcehre - DeepMind\n\nAbstract\nMy main research i
 nterest is to build robust AI algorithms that can learn to reason using mu
 lti-modal data efficiently using its own experiences or experiences of oth
 er agents with the ability to adapt to changes and improve themselves (con
 tinual learning.) I tackle this problem by using deep reinforcement learni
 ng\, which allows agents to learn by trial-and-error or by imitating their
  own experiences obtained via the interaction with an environment (online)
  or experiences of other agents (offline.) I argue that the real-world imp
 act of deep RL has been limited\, laying out some of the challenges. I ide
 ntify that AI for Science is one of the most promising directions where de
 ep learning and RL algorithms can make a positive social impact on real-wo
 rld problems. I suggest that offline RL and imitation learning are crucial
  components to reaching the goal of bridging science and AI and building m
 achines that develop broadly intelligent behaviors. Since environment inte
 ractions can be costly or unsafe offline RL in the real-world and having r
 ealistic simulations may not be possible\, offline RL is a promising way t
 o learn systems that can reason guided by a feedback signal. I will show t
 hat imitation learning can complement offline RL when environment interact
 ions are possible\, but the exploration and credit assignment in the envir
 onment are still challenging\, or even there may not be any clear reward s
 ignal coming from the environment. There has been a lack of large-scale ch
 allenging offline RL benchmarks to track the progress in the field. I will
  present the offline RL benchmarks we released/(are releasing)\, such as R
 L Unplugged and Starcraft 2 Unplugged. On the large-scale challenging benc
 hmarks\, we identified that the policy improvement operators could be dang
 erous or harmful for offline RL during the training. I will discuss severa
 l offline RL approaches proposed to address this\, such as “Regularized 
 Behavior Value Estimation” and Offline Actor-Critic\, which share the sa
 me core idea: limiting the number of policy improvements when learning a p
 olicy from offline data. We show that borrowing some ideas from imitation 
 learning makes it possible to learn complicated control tasks with offline
  RL without rewards using generative adversarial imitation learning. Moreo
 ver\, in critic regularized regression\, using selective imitation\, where
  a critic filters out the bad/dangerous actions for the policy network\, w
 hich we call selective imitation\, makes it possible to learn high-dimensi
 onal and partially observable policies. We showed that offline RL and imit
 ation learning can be scaled up to challenging\, partially observable real
 -world environments and outperform supervised learning approaches. Finally
 \, I will discuss open research problems and exciting challenges in DL and
  RL with a vision to bring us towards building AI algorithms for scientist
  assistants.\n\nBio\nCaglar Gulcehre (CG) is currently a senior research s
 cientist at DeepMind and completed his Ph.D. under the supervision of Yosh
 ua Bengio at MILA (Quebec AI Institute.) His research interests are reinfo
 rcement learning (RL\,) deep learning\, representation learning\, natural 
 language understanding (NLP\,)  and more recently AI for Science. CG is c
 urrently working on building general\, efficient\, and robust agents that 
 can learn from a feedback signal (often weak\, sparse\, and noisy) while u
 tilizing unlabeled data available in an imperfect environment. CG works to
  improve the scientific understanding of existing algorithms and develop n
 ew ones to enable real-world applications with positive social impact. Whe
 n working on algorithmic solutions\, he enjoys approaching problems with m
 ulti/cross-disciplinary insights and is often inspired by neuroscience\, b
 iology\, and cognitive sciences. \n\nCG serves as an action editor for th
 e TMLR journal and as an area-chair and reviewer for major machine learnin
 g conferences and journals such as JMLR\, Nature\, TPAMI\, ICML\, ICLR\, N
 euRIPS and AISTATS. He has published at numerous influential conferences a
 nd journals such as Nature\, JMLR\, NeurIPS\, ICML\, ICLR\, ACL\, EMNLP\, 
 ECML\, IJCNN\, and etc... His work received the best paper award at the Ne
 urIPS 2015 workshop on Nonconvex Optimization and an honorable mention for
  best paper at ICML 2019. CG co-organized the Science and Engineering of D
 eep Learning workshops at NeurIPS and ICLR. CG is currently co-organizing 
 a workshop on "Setting ML Evaluation Standards to Accelerate Progress" at 
 ICLR 2022 and a CRAFT workshop on "Values and Science of Deep Learning" in
  ACM FaCCT 2022. Throughout his career\, CG has sought to actively mentor 
 through initiatives such as AIMS\, Indiba and DeepMind Scholars.\n\nMore i
 nformation
LOCATION:BC 420 https://plan.epfl.ch/?room==BC%20420 https://epfl.zoom.us/
 j/66059888068?pwd=NkJYdk1JMlp4L0N6dncyUWFYRDkwUT09
STATUS:CANCELLED
END:VEVENT
END:VCALENDAR