NLP Seminar: An introduction to mechanistic interpretability

Event details

Date	03.12.2024
Hour	11:00 › 12:00
Speaker	Michael Hanna
Location	DIA 003 Online
Category	Conferences - Seminars
Event Language	English

Bio: Michael Hanna is a 3rd-year PhD student at the University of Amsterdam, as part of the Institute for Logic, Language and Computation. His research focuses on understanding the abilities of pre-trained language models, and linking these behaviors to low-level mechanisms using causal methods. He is particularly interested in understanding such language models' linguistic abilities, and their connection to human linguistic competence.

Abstract: Despite large language models' (LLMs) growing capabilities and increasing real-world deployment, our understanding of how they work remains limited: we don't know how they store facts, "reason", or even just produce grammatical language. Answering these questions could have wide-ranging impacts, from increasing model controllability to revealing insights into how human language works. In this lecture, I will provide an introduction to the field of mechanistic interpretability, which seeks to reverse-engineer LLMs by providing low-level, causally grounded explanations of the mechanisms that LLMs use to solve tasks. In particular, I will focus on two big questions often targeted by mechanistic interpretability, namely "Which parts of my model are responsible for a given task?" and "What features does my model use to solve this task?", as well as the solutions—circuit analysis and sparse autoencoders—that mechanistic interpretability has offered. By the end of this lecture, I aim to have provided you with all the tools necessary to begin your own investigation into what makes LLMs work!

Practical information

Informed public
Free

Organizer

Antoine Bosselut, NLP lab

Contact

Anna Sotnikova

Export Event

Event broadcasted in

Send a reminder