Efficient Architecture via Structured Matrices

Thumbnail

Event details

Date 26.08.2024
Hour 14:1516:00
Speaker Xiuying Wei
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Caglar Gulcehre
Co-examiner: Prof. Pascal Frossard

Abstract
In this work, we delve into the exploration of low-rank and block-diagonal matrices within neural networks from accuracy and efficiency aspects. Initially, we introduce a seminal paper, "Convolutional Neural Networks with Low-Rank Regularization" (ICLR 2016), which innovatively applies low-rank tensor decomposition to convolutional networks and adopts batch normalization to stabilize training. Subsequently, we discuss two pivotal studies: "Monarch: Expressive Structured Matrices for Efficient and Accurate Training" (ICML 2022) and "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture" (NeurIPS 2023). These papers utilize block-diagonal matrices, and specifically propose Monarch parametrization to mix features, to optimize the linear layers in Transformers. Building on these foundational works, we conducted an extensive investigation into training-from-scratch and architectural design aspects of recent large language models (LLMs) at significant scales, uncovering new insights and advancements.

Background papers
1. Convolutional Neural Networks with Low-Rank Regularization. https://arxiv.org/abs/1511.06067 
2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training. https://arxiv.org/abs/2204.00595
3. Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. https://arxiv.org/abs/2310.12109
 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share