Analyzing and Improving the Robustness of Deep Learning Models via Mechanistic Interpretability.

Thumbnail

Event details

Date 26.08.2025
Hour 13:0015:00
Speaker Amel Abdelraheem
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Patrick Thiran

Abstract
Pre-trained models are increasingly used as foundations for downstream tasks, which makes their robustness, safety, and reliability increasingly more important. Recent literature highlights a striking property: deep neural networks exhibit an underlying linear structure. Phenomena such as linear mode connectivity show that independently trained models can be joined by low loss paths in weight space, and these paths can be exploited to study and improve adversarial robustness. More recently, linear mode connectivity has been linked to the distinct internal mechanisms that models use to make predictions, helping to explain why fine-tuning alone may fail to remove spurious correlations. Building on this insight, model editing techniques, specifically task arithmetic, demonstrate that traversing directions in weight space can edit a model's behavior, strengthening or suppressing certain capabilities without full retraining and thereby enhancing robustness. Taken together, these results encourage examining pre-trained networks through a mechanistic lens, providing concrete tools for analyzing and steering deep learning models.

Selected papers

Practical information

  • General public
  • Free

Contact

  • edic@epfl.ch

Tags

EDIC candidacy exam

Share