IC Colloquium: Understanding and Improving Reasoning in Large Language Models
Par : Nouha Dziri - Allen Institute for AI
IC Faculty candidate
Abstract
Despite the impressive capabilities of AI models, they present a striking paradox: they can solve Olympiad-level math problems while still making basic reasoning errors that even non-experts would avoid. This points to fundamental limitations in their ability to generalize.
In this talk, I will share key insights into the strengths and limitations of large language models, examine when reinforcement learning improves reasoning and when it struggles to generalize, and explore approaches to enhance their reasoning capabilities. I will also argue that many safety failures stem from underlying reasoning failures, and discuss how to make models more robust to adversarial attacks. I will conclude with a forward-looking research agenda: scaling reasoning through smarter algorithms and training recipes, rigorous evaluation frameworks to measure real progress, and ensuring safety at every stage as models become increasingly autonomous.
Bio
Nouha Dziri is a Senior Research Scientist at Ai2. Her research spans a wide range of problems in AI, with a focus on building and improving large language models (LLMs). She co-led the post-training effort for the OLMo models. Prior to that, she was a postdoctoral researcher working with Yejin Choi at Ai2. She has also interned at Google DeepMind, Microsoft Research, and Mila. She is the recipient of multiple awards, including Best Paper Awards at NeurIPS 2025 and NAACL 2025. Her work has been featured in leading media outlets including The Economist, TechCrunch, Le Monde, Science, and Quanta Magazine. She has delivered invited talks at top universities such as Stanford, Oxford, Cambridge, Edinburgh, Princeton, McGill, and CMU. She has also been a keynote speaker and a panelist at workshops at major AI conferences including NeurIPS, ICLR, and ICML. She earned her PhD from the University of Alberta and the Alberta Machine Intelligence Institute (Amii) in 2023.
More information
IC Faculty candidate
Abstract
Despite the impressive capabilities of AI models, they present a striking paradox: they can solve Olympiad-level math problems while still making basic reasoning errors that even non-experts would avoid. This points to fundamental limitations in their ability to generalize.
In this talk, I will share key insights into the strengths and limitations of large language models, examine when reinforcement learning improves reasoning and when it struggles to generalize, and explore approaches to enhance their reasoning capabilities. I will also argue that many safety failures stem from underlying reasoning failures, and discuss how to make models more robust to adversarial attacks. I will conclude with a forward-looking research agenda: scaling reasoning through smarter algorithms and training recipes, rigorous evaluation frameworks to measure real progress, and ensuring safety at every stage as models become increasingly autonomous.
Bio
Nouha Dziri is a Senior Research Scientist at Ai2. Her research spans a wide range of problems in AI, with a focus on building and improving large language models (LLMs). She co-led the post-training effort for the OLMo models. Prior to that, she was a postdoctoral researcher working with Yejin Choi at Ai2. She has also interned at Google DeepMind, Microsoft Research, and Mila. She is the recipient of multiple awards, including Best Paper Awards at NeurIPS 2025 and NAACL 2025. Her work has been featured in leading media outlets including The Economist, TechCrunch, Le Monde, Science, and Quanta Magazine. She has delivered invited talks at top universities such as Stanford, Oxford, Cambridge, Edinburgh, Princeton, McGill, and CMU. She has also been a keynote speaker and a panelist at workshops at major AI conferences including NeurIPS, ICLR, and ICML. She earned her PhD from the University of Alberta and the Alberta Machine Intelligence Institute (Amii) in 2023.
More information
Practical information
- General public
- Free
Contact
- Host: Tanja Käser