Towards Understanding Large Language Models

Thumbnail

Event details

Date 12.06.2026
Hour 15:3017:30
Speaker Mark Rofin
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Robert West
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Martin Jaggi

Abstract
LLM interpretability is a research area aiming to understand the inner workings of Large Language Models and the mechanisms they use to solve tasks. However, the mainstream approaches in interpretability usually use static approaches operating on a frozen model, without fully utilizing information about the data or the gradient signal during training. In this write-up, I outline a developmental perspective on interpretability, whose main idea is to focus on analyzing pretraining and finetuning statistics in addition to static evaluations. Three papers implementing different aspects of the idea are discussed: Michaud et al. (2023); Aden-Ali et al. (2026); Vafa et al. (2025).

Selected papers
https://arxiv.org/abs/2303.13506 -- The Quantization Model of Neural Scaling
https://arxiv.org/abs/2602.04863 -- Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
https://arxiv.org/abs/2507.06952 -- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share