Mathematical Reasoning with Large Language Models
Event details
Date | 17.09.2024 |
Hour | 15:00 › 17:00 |
Speaker | Anja Surina |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Emmanuel Abbé
Thesis advisor: Prof. Caglar Gulcehre
Co-examiner: Prof. Antoine Bosselut
Abstract
The performance of state-of-the-art large language
models (LLMs) is impressive, particularly in mastery of knowledge.
However, their reasoning abilities are lagging behind, with many
failure modes. In this write-up, we explore approaches for
improving the reasoning capabilities of LLMs, particularly in
mathematical problem-solving. We will focus on iterative selfimprovement
methods that do not rely on expert-generated data.
We present and analyze three papers: the first demonstrates
how LLMs can be used to make progress on open mathematical
problems by searching in the space of functions. The second
paper shows how LLMs can be trained on their own discovered
solutions, using both correct and incorrect outputs for learning.
Finally, with the third paper, we introduce step-wise rewards,
which are used to provide denser training signals to the LLM.
The write-up concludes with a discussion on our research plan
to combine LLMs with reinforcement learning (RL) for scientific
discovery and reasoning.
Background papers
Exam president: Prof. Emmanuel Abbé
Thesis advisor: Prof. Caglar Gulcehre
Co-examiner: Prof. Antoine Bosselut
Abstract
The performance of state-of-the-art large language
models (LLMs) is impressive, particularly in mastery of knowledge.
However, their reasoning abilities are lagging behind, with many
failure modes. In this write-up, we explore approaches for
improving the reasoning capabilities of LLMs, particularly in
mathematical problem-solving. We will focus on iterative selfimprovement
methods that do not rely on expert-generated data.
We present and analyze three papers: the first demonstrates
how LLMs can be used to make progress on open mathematical
problems by searching in the space of functions. The second
paper shows how LLMs can be trained on their own discovered
solutions, using both correct and incorrect outputs for learning.
Finally, with the third paper, we introduce step-wise rewards,
which are used to provide denser training signals to the LLM.
The write-up concludes with a discussion on our research plan
to combine LLMs with reinforcement learning (RL) for scientific
discovery and reasoning.
Background papers
Practical information
- General public
- Free