Analyzing and Improving the Robustness of Deep Learning Models via Mechanistic Interpretability.

Event details

Date	26.08.2025
Hour	13:00 › 15:00
Speaker	Amel Abdelraheem
Location	ELE 240
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Patrick Thiran

Abstract
Pre-trained models are increasingly used as foundations for downstream tasks, which makes their robustness, safety, and reliability increasingly more important. Recent literature highlights a striking property: deep neural networks exhibit an underlying linear structure. Phenomena such as linear mode connectivity show that independently trained models can be joined by low loss paths in weight space, and these paths can be exploited to study and improve adversarial robustness. More recently, linear mode connectivity has been linked to the distinct internal mechanisms that models use to make predictions, helping to explain why fine-tuning alone may fail to remove spurious correlations. Building on this insight, model editing techniques, specifically task arithmetic, demonstrate that traversing directions in weight space can edit a model's behavior, strengthening or suppressing certain capabilities without full retraining and thereby enhancing robustness. Taken together, these results encourage examining pre-trained networks through a mechanistic lens, providing concrete tools for analyzing and steering deep learning models.

Selected papers

Practical information

General public
Free

Contact

[email protected]

Export Event

Event broadcasted in