Building Multi-task Models: From Multi-task Learning to Model Merging
Event details
Date | 27.02.2024 |
Hour | 13:00 › 15:00 |
Speaker | Ke Wang |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Amir Zamir
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Martin Jaggi
Abstract
Multi-task models efficiently utilize shared representations to handle various tasks with reduced storage requirements. While many multi-task learning (MTL) methods have been developed to train such models jointly and deliver good performance, they often incur non-trivial costs for training and require simultaneous access to all tasks. Recently, model merging techniques like task arithmetic have emerged as train-free alternatives to construct multi-task models by directly merging separately fine-tuned models. However, these methods tend to exhibit performance degradation compared to traditional MTL approaches. In this report, we will explore three methods for constructing multi-task models: one based on MTL and two utilizing model-merging approaches. Additionally, we will delve into our research on understanding the reasons behind the performance decline observed in model merging methods, along with our proposed approach to enhance their performance.
Background papers
Exam president: Prof. Amir Zamir
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Martin Jaggi
Abstract
Multi-task models efficiently utilize shared representations to handle various tasks with reduced storage requirements. While many multi-task learning (MTL) methods have been developed to train such models jointly and deliver good performance, they often incur non-trivial costs for training and require simultaneous access to all tasks. Recently, model merging techniques like task arithmetic have emerged as train-free alternatives to construct multi-task models by directly merging separately fine-tuned models. However, these methods tend to exhibit performance degradation compared to traditional MTL approaches. In this report, we will explore three methods for constructing multi-task models: one based on MTL and two utilizing model-merging approaches. Additionally, we will delve into our research on understanding the reasons behind the performance decline observed in model merging methods, along with our proposed approach to enhance their performance.
Background papers
- Towards Impartial Multi-task Learning, ICLR 2021.
- Editing Models with Task Arithmetic, ICLR 2023
- TIES-Merging: Resolving Interference When Merging Models, NeurIPS 2023
Practical information
- General public
- Free