IC Colloquium: Bridging Science and AI: Towards Building AI Algorithms for Real-World
By: Caglar Gulcehre - DeepMind
Abstract
My main research interest is to build robust AI algorithms that can learn to reason using multi-modal data efficiently using its own experiences or experiences of other agents with the ability to adapt to changes and improve themselves (continual learning.) I tackle this problem by using deep reinforcement learning, which allows agents to learn by trial-and-error or by imitating their own experiences obtained via the interaction with an environment (online) or experiences of other agents (offline.) I argue that the real-world impact of deep RL has been limited, laying out some of the challenges. I identify that AI for Science is one of the most promising directions where deep learning and RL algorithms can make a positive social impact on real-world problems. I suggest that offline RL and imitation learning are crucial components to reaching the goal of bridging science and AI and building machines that develop broadly intelligent behaviors. Since environment interactions can be costly or unsafe offline RL in the real-world and having realistic simulations may not be possible, offline RL is a promising way to learn systems that can reason guided by a feedback signal. I will show that imitation learning can complement offline RL when environment interactions are possible, but the exploration and credit assignment in the environment are still challenging, or even there may not be any clear reward signal coming from the environment. There has been a lack of large-scale challenging offline RL benchmarks to track the progress in the field. I will present the offline RL benchmarks we released/(are releasing), such as RL Unplugged and Starcraft 2 Unplugged. On the large-scale challenging benchmarks, we identified that the policy improvement operators could be dangerous or harmful for offline RL during the training. I will discuss several offline RL approaches proposed to address this, such as “Regularized Behavior Value Estimation” and Offline Actor-Critic, which share the same core idea: limiting the number of policy improvements when learning a policy from offline data. We show that borrowing some ideas from imitation learning makes it possible to learn complicated control tasks with offline RL without rewards using generative adversarial imitation learning. Moreover, in critic regularized regression, using selective imitation, where a critic filters out the bad/dangerous actions for the policy network, which we call selective imitation, makes it possible to learn high-dimensional and partially observable policies. We showed that offline RL and imitation learning can be scaled up to challenging, partially observable real-world environments and outperform supervised learning approaches. Finally, I will discuss open research problems and exciting challenges in DL and RL with a vision to bring us towards building AI algorithms for scientist assistants.
Bio
Caglar Gulcehre (CG) is currently a senior research scientist at DeepMind and completed his Ph.D. under the supervision of Yoshua Bengio at MILA (Quebec AI Institute.) His research interests are reinforcement learning (RL,) deep learning, representation learning, natural language understanding (NLP,) and more recently AI for Science. CG is currently working on building general, efficient, and robust agents that can learn from a feedback signal (often weak, sparse, and noisy) while utilizing unlabeled data available in an imperfect environment. CG works to improve the scientific understanding of existing algorithms and develop new ones to enable real-world applications with positive social impact. When working on algorithmic solutions, he enjoys approaching problems with multi/cross-disciplinary insights and is often inspired by neuroscience, biology, and cognitive sciences.
CG serves as an action editor for the TMLR journal and as an area-chair and reviewer for major machine learning conferences and journals such as JMLR, Nature, TPAMI, ICML, ICLR, NeuRIPS and AISTATS. He has published at numerous influential conferences and journals such as Nature, JMLR, NeurIPS, ICML, ICLR, ACL, EMNLP, ECML, IJCNN, and etc... His work received the best paper award at the NeurIPS 2015 workshop on Nonconvex Optimization and an honorable mention for best paper at ICML 2019. CG co-organized the Science and Engineering of Deep Learning workshops at NeurIPS and ICLR. CG is currently co-organizing a workshop on "Setting ML Evaluation Standards to Accelerate Progress" at ICLR 2022 and a CRAFT workshop on "Values and Science of Deep Learning" in ACM FaCCT 2022. Throughout his career, CG has sought to actively mentor through initiatives such as AIMS, Indiba and DeepMind Scholars.
More information
Abstract
My main research interest is to build robust AI algorithms that can learn to reason using multi-modal data efficiently using its own experiences or experiences of other agents with the ability to adapt to changes and improve themselves (continual learning.) I tackle this problem by using deep reinforcement learning, which allows agents to learn by trial-and-error or by imitating their own experiences obtained via the interaction with an environment (online) or experiences of other agents (offline.) I argue that the real-world impact of deep RL has been limited, laying out some of the challenges. I identify that AI for Science is one of the most promising directions where deep learning and RL algorithms can make a positive social impact on real-world problems. I suggest that offline RL and imitation learning are crucial components to reaching the goal of bridging science and AI and building machines that develop broadly intelligent behaviors. Since environment interactions can be costly or unsafe offline RL in the real-world and having realistic simulations may not be possible, offline RL is a promising way to learn systems that can reason guided by a feedback signal. I will show that imitation learning can complement offline RL when environment interactions are possible, but the exploration and credit assignment in the environment are still challenging, or even there may not be any clear reward signal coming from the environment. There has been a lack of large-scale challenging offline RL benchmarks to track the progress in the field. I will present the offline RL benchmarks we released/(are releasing), such as RL Unplugged and Starcraft 2 Unplugged. On the large-scale challenging benchmarks, we identified that the policy improvement operators could be dangerous or harmful for offline RL during the training. I will discuss several offline RL approaches proposed to address this, such as “Regularized Behavior Value Estimation” and Offline Actor-Critic, which share the same core idea: limiting the number of policy improvements when learning a policy from offline data. We show that borrowing some ideas from imitation learning makes it possible to learn complicated control tasks with offline RL without rewards using generative adversarial imitation learning. Moreover, in critic regularized regression, using selective imitation, where a critic filters out the bad/dangerous actions for the policy network, which we call selective imitation, makes it possible to learn high-dimensional and partially observable policies. We showed that offline RL and imitation learning can be scaled up to challenging, partially observable real-world environments and outperform supervised learning approaches. Finally, I will discuss open research problems and exciting challenges in DL and RL with a vision to bring us towards building AI algorithms for scientist assistants.
Bio
Caglar Gulcehre (CG) is currently a senior research scientist at DeepMind and completed his Ph.D. under the supervision of Yoshua Bengio at MILA (Quebec AI Institute.) His research interests are reinforcement learning (RL,) deep learning, representation learning, natural language understanding (NLP,) and more recently AI for Science. CG is currently working on building general, efficient, and robust agents that can learn from a feedback signal (often weak, sparse, and noisy) while utilizing unlabeled data available in an imperfect environment. CG works to improve the scientific understanding of existing algorithms and develop new ones to enable real-world applications with positive social impact. When working on algorithmic solutions, he enjoys approaching problems with multi/cross-disciplinary insights and is often inspired by neuroscience, biology, and cognitive sciences.
CG serves as an action editor for the TMLR journal and as an area-chair and reviewer for major machine learning conferences and journals such as JMLR, Nature, TPAMI, ICML, ICLR, NeuRIPS and AISTATS. He has published at numerous influential conferences and journals such as Nature, JMLR, NeurIPS, ICML, ICLR, ACL, EMNLP, ECML, IJCNN, and etc... His work received the best paper award at the NeurIPS 2015 workshop on Nonconvex Optimization and an honorable mention for best paper at ICML 2019. CG co-organized the Science and Engineering of Deep Learning workshops at NeurIPS and ICLR. CG is currently co-organizing a workshop on "Setting ML Evaluation Standards to Accelerate Progress" at ICLR 2022 and a CRAFT workshop on "Values and Science of Deep Learning" in ACM FaCCT 2022. Throughout his career, CG has sought to actively mentor through initiatives such as AIMS, Indiba and DeepMind Scholars.
More information
Practical information
- General public
- Free
Contact
- Host: Lenka Zdeborova