Towards Understanding Large Language Models
Event details
| Date | 12.06.2026 |
| Hour | 15:30 › 17:30 |
| Speaker | Mark Rofin |
| Location | |
| Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Robert West
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Martin Jaggi
Abstract
LLM interpretability is a research area aiming to understand the inner workings of Large Language Models and the mechanisms they use to solve tasks. However, the mainstream approaches in interpretability usually use static approaches operating on a frozen model, without fully utilizing information about the data or the gradient signal during training. In this write-up, I outline a developmental perspective on interpretability, whose main idea is to focus on analyzing pretraining and finetuning statistics in addition to static evaluations. Three papers implementing different aspects of the idea are discussed: Michaud et al. (2023); Aden-Ali et al. (2026); Vafa et al. (2025).
Selected papers
https://arxiv.org/abs/2303.13506 -- The Quantization Model of Neural Scaling
https://arxiv.org/abs/2602.04863 -- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
https://arxiv.org/abs/2507.06952 -- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Exam president: Prof. Robert West
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Martin Jaggi
Abstract
LLM interpretability is a research area aiming to understand the inner workings of Large Language Models and the mechanisms they use to solve tasks. However, the mainstream approaches in interpretability usually use static approaches operating on a frozen model, without fully utilizing information about the data or the gradient signal during training. In this write-up, I outline a developmental perspective on interpretability, whose main idea is to focus on analyzing pretraining and finetuning statistics in addition to static evaluations. Three papers implementing different aspects of the idea are discussed: Michaud et al. (2023); Aden-Ali et al. (2026); Vafa et al. (2025).
Selected papers
https://arxiv.org/abs/2303.13506 -- The Quantization Model of Neural Scaling
https://arxiv.org/abs/2602.04863 -- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
https://arxiv.org/abs/2507.06952 -- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Practical information
- General public
- Free