BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Emergent Capabilities in Modern Sequence Models: Phase Transitions
 \, Memory in Shallow Transformers\, and Bidirectional State-Space Architec
 tures.
DTSTART:20250617T140000
DTEND:20250617T160000
DTSTAMP:20260526T053507Z
UID:81143e6c09a8d9ab70c6db0f3948b2615cbf82be0e558f6e95e47643
CATEGORIES:Conferences - Seminars
DESCRIPTION:Fabrizio Boncoraglio\nEDIC candidacy exam\nExam president: Pro
 f. Michael Gastpar\nThesis advisor: Prof. Lenka Zdeborova\nCo-examiner: Pr
 of. Matthieu Wyart\n\nAbstract\nRecent theory has begun to expose the stat
 istical principles that underlie modern sequence models. Neural sequence m
 odeling has recently progressed along three complementary axes : (i) high-
 dimensional analyses can provide a solvable statistical-physics framework 
 for attention models.The work done by Cui et. al\, reveals abrupt semantic
 âpositional phase transitions in low-rank attention\, offering closed
 -form generalization predictions as data scale varies\;\n(ii) associative-
 memory studies prove that shallow Transformers can store O(parameters) fac
 ts. Nichani et al. work reveals that shallow transformers achieve near-opt
 imal factual storage by allocating capacity between self-attention and MLP
  blocks and\n(iii) unified matrix-mixer perspectives that frame most mixer
 s\, including attention. Hwang et al. introduced Hydra\, a bidirectional\,
  quasiseparable SSM (State Space Model) which matches Transformer accuracy
  under certain conditions with linear inference cost.  The objective of t
 his write-up is to knit those axes into a single statistical framework. Bu
 ilding on these insights and on our own work in Boncoraglio et al.\, we pr
 opose an integrated framework that (a) explains when and why models transi
 tion from positional heuristics to semantic abstraction\, (b) leverages du
 al associative memories to guarantee linear-in-parameters factual recall\,
  (c) extends quasiseparable mixers to support data-dependent parameterizat
 ion with provable memory efficiency and (d) integrates these results and h
 ighlights further perspectives. \nThe resulting roadmap connects phase-tr
 ansition theory\, memory capacity and structured mixers\, charting a princ
 ipled path toward scalable\, memory-rich sequence models.\n\nSelected pape
 rs\n- A Phase Transition between Positional and Semantic Learning in a Sol
 vable Model of Dot-Product Attention: https://arxiv.org/pdf/2402.03902\n- 
 Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers:
  https://arxiv.org/pdf/2407.09941\n- Understanding Factual Recall in Trans
 formers via Associative Memories: https://arxiv.org/pdf/2412.06538 
LOCATION:PH H3 33 https://plan.epfl.ch/?room==PH%20H3%2033
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
