AI Center - AI Fundamentals Series - Dr. Stanislav Fort, Google DeepMind

Thumbnail

Event details

Date 28.11.2024
Hour 14:0015:00
Speaker Dr. Stanislav Fort
Location Online
Category Conferences - Seminars
Event Language English

Title
Adversarial attacks as a baby version of A(G)I alignment

Abstract
Adversarial attacks pose a significant challenge to the robustness, reliability and alignment of deep neural networks from simple computer vision to hundred-billion-parameter language models. Despite their ubiquitous nature, our theoretical understanding of their character and ultimate causes, as well as our ability to successfully defend against them, are noticeably lacking. This talk examines the robustness of modern deep learning methods and the surprising scaling of attacks on them, and showcases several practical examples of transferable attacks on the largest closed-source vision-language models out there. Building on biological insights and new empirical evidence, I will introduce our solution proposed in [1], in which we make a step towards the alignment of the implicit human and the explicit machine vision representations, closely connecting interpretability and robustness. I will conclude with a direct analogy between the problem of adversarial examples and the much larger task of general AI alignment.

[1] Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness. Stanislav Fort, Balaji Lakshminarayanan

Bio 
Stanislav Fort is a senior research scientist at Google DeepMind, specializing in robustness, interpretability and safety. He received his PhD in 2022 from Stanford University with Prof. Surya Ganguli. In the past, Stanislav spent time at Google Brain as an AI Resident, worked on the Claude model at Anthropic, and led the language model team at Stability AI. He received his Bachelor's and Master's degrees in theoretical physics from the University of Cambridge.

Academic publications: https://scholar.google.com/citations?user=eu2Kzn0AAAAJ&hl=en&oi=ao
Personal website: https://stanislavfort.com/

Links

Practical information

  • Informed public
  • Registration required
  • This event is internal

Organizer

Contact

Tags

SB STI IC Intelligence artificielle Artificial intelligence AI ML Machine Learning neural networks Adversarial attacks

Event broadcasted in

Share