Emergent Capabilities in Modern Sequence Models: Phase Transitions, Memory in Shallow Transformers, and Bidirectional State-Space Architectures.

Thumbnail

Event details

Date 17.06.2025
Hour 14:0016:00
Speaker Fabrizio Boncoraglio
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Michael Gastpar
Thesis advisor: Prof. Lenka Zdeborova
Co-examiner: Prof. Matthieu Wyart

Abstract
Recent theory has begun to expose the statistical principles that underlie modern sequence models. Neural sequence modeling has recently progressed along three complementary axes : (i) high-dimensional analyses can provide a solvable statistical-physics framework for attention models.The work done by Cui et. al, reveals abrupt semantic–positional phase transitions in low-rank attention, offering closed-form generalization predictions as data scale varies;
(ii) associative-memory studies prove that shallow Transformers can store O(parameters) facts. Nichani et al. work reveals that shallow transformers achieve near-optimal factual storage by allocating capacity between self-attention and MLP blocks and
(iii) unified matrix-mixer perspectives that frame most mixers, including attention. Hwang et al. introduced Hydra, a bidirectional, quasiseparable SSM (State Space Model) which matches Transformer accuracy under certain conditions with linear inference cost.  The objective of this write-up is to knit those axes into a single statistical framework. Building on these insights and on our own work in Boncoraglio et al., we propose an integrated framework that (a) explains when and why models transition from positional heuristics to semantic abstraction, (b) leverages dual associative memories to guarantee linear-in-parameters factual recall, (c) extends quasiseparable mixers to support data-dependent parameterization with provable memory efficiency and (d) integrates these results and highlights further perspectives. 
The resulting roadmap connects phase-transition theory, memory capacity and structured mixers, charting a principled path toward scalable, memory-rich sequence models.

Selected papers
- A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention: https://arxiv.org/pdf/2402.03902
- Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers: https://arxiv.org/pdf/2407.09941
- Understanding Factual Recall in Transformers via Associative Memories: https://arxiv.org/pdf/2412.06538 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share