Efficient Architecture via Structured Matrices

Event details

Date	26.08.2024
Hour	14:15 › 16:00
Speaker	Xiuying Wei
Location	BC 333
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Caglar Gulcehre
Co-examiner: Prof. Pascal Frossard

Abstract
In this work, we delve into the exploration of low-rank and block-diagonal matrices within neural networks from accuracy and efficiency aspects. Initially, we introduce a seminal paper, "Convolutional Neural Networks with Low-Rank Regularization" (ICLR 2016), which innovatively applies low-rank tensor decomposition to convolutional networks and adopts batch normalization to stabilize training. Subsequently, we discuss two pivotal studies: "Monarch: Expressive Structured Matrices for Efficient and Accurate Training" (ICML 2022) and "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture" (NeurIPS 2023). These papers utilize block-diagonal matrices, and specifically propose Monarch parametrization to mix features, to optimize the linear layers in Transformers. Building on these foundational works, we conducted an extensive investigation into training-from-scratch and architectural design aspects of recent large language models (LLMs) at significant scales, uncovering new insights and advancements.

Background papers
1. Convolutional Neural Networks with Low-Rank Regularization. https://arxiv.org/abs/1511.06067
2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training. https://arxiv.org/abs/2204.00595
3. Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. https://arxiv.org/abs/2310.12109

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in