Towards Understanding Large Language Models

Event details

Date	12.06.2026
Hour	15:30 › 17:30
Speaker	Mark Rofin
Location	BC 329
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Robert West
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Martin Jaggi

Abstract
LLM interpretability is a research area aiming to understand the inner workings of Large Language Models and the mechanisms they use to solve tasks. However, the mainstream approaches in interpretability usually use static approaches operating on a frozen model, without fully utilizing information about the data or the gradient signal during training. In this write-up, I outline a developmental perspective on interpretability, whose main idea is to focus on analyzing pretraining and finetuning statistics in addition to static evaluations. Three papers implementing different aspects of the idea are discussed: Michaud et al. (2023); Aden-Ali et al. (2026); Vafa et al. (2025).

Selected papers
https://arxiv.org/abs/2303.13506 -- The Quantization Model of Neural Scaling
https://arxiv.org/abs/2602.04863 -- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
https://arxiv.org/abs/2507.06952 -- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Practical information

General public
Free

Contact

[email protected]

Export Event

Event broadcasted in