AI Center - AI Fundamentals Series - Dr. Stanislav Fort, Google DeepMind

Event details

Date	28.11.2024
Hour	14:00 › 15:00
Speaker	Dr. Stanislav Fort
Location	ELE 117 AI Center Lounge Online
Category	Conferences - Seminars
Event Language	English

Title
Adversarial attacks as a baby version of A(G)I alignment

Abstract
Adversarial attacks pose a significant challenge to the robustness, reliability and alignment of deep neural networks from simple computer vision to hundred-billion-parameter language models. Despite their ubiquitous nature, our theoretical understanding of their character and ultimate causes, as well as our ability to successfully defend against them, are noticeably lacking. This talk examines the robustness of modern deep learning methods and the surprising scaling of attacks on them, and showcases several practical examples of transferable attacks on the largest closed-source vision-language models out there. Building on biological insights and new empirical evidence, I will introduce our solution proposed in [1], in which we make a step towards the alignment of the implicit human and the explicit machine vision representations, closely connecting interpretability and robustness. I will conclude with a direct analogy between the problem of adversarial examples and the much larger task of general AI alignment.

[1] Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness. Stanislav Fort, Balaji Lakshminarayanan

Bio
Stanislav Fort is a senior research scientist at Google DeepMind, specializing in robustness, interpretability and safety. He received his PhD in 2022 from Stanford University with Prof. Surya Ganguli. In the past, Stanislav spent time at Google Brain as an AI Resident, worked on the Claude model at Anthropic, and led the language model team at Stability AI. He received his Bachelor's and Master's degrees in theoretical physics from the University of Cambridge.

Academic publications: https://scholar.google.com/citations?user=eu2Kzn0AAAAJ&hl=en&oi=ao
Personal website: https://stanislavfort.com/

Practical information

Informed public
Registration required
This event is internal

Organizer

EPFL AI Center

Contact

Nicolas Machado

Export Event

Event broadcasted in

Send a reminder