Feedback-based alignment of large language models

Event details
Date | 07.09.2023 |
Hour | 10:00 › 12:00 |
Speaker | Beatriz Borges Ribeiro |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Tanja Käser
Thesis advisor: Prof. Antoine Bosselut
Co-examiner: Prof. Robert West
Abstract
Feedback has become an increasingly popular avenue for aligning Large Language Models (LLMs) to human values. Out of many different possible formulations of feedback, Natural Language Feedback (NLF) stands out as the richest and most diverse in terms of the information it can convey. However, exploration on what makes NLF effective remains an open question, as current approaches are typically hand-designed and arbitrary. My work will seek to first survey the most impactful feedback models in pedagogy for human learning. Then, with this grounding, I will propose novel approaches and systems for feedback and model alignment, aiming to leverage higher granularities than are currently used.
Background papers
Exam president: Prof. Tanja Käser
Thesis advisor: Prof. Antoine Bosselut
Co-examiner: Prof. Robert West
Abstract
Feedback has become an increasingly popular avenue for aligning Large Language Models (LLMs) to human values. Out of many different possible formulations of feedback, Natural Language Feedback (NLF) stands out as the richest and most diverse in terms of the information it can convey. However, exploration on what makes NLF effective remains an open question, as current approaches are typically hand-designed and arbitrary. My work will seek to first survey the most impactful feedback models in pedagogy for human learning. Then, with this grounding, I will propose novel approaches and systems for feedback and model alignment, aiming to leverage higher granularities than are currently used.
Background papers
- Proximal Policy Optimization Algorithms ( https://arxiv.org/abs/1707.06347 )
- Training language models to follow instructions with human feedback ( https://arxiv.org/abs/2203.02155 )
- Fine-Grained Human Feedback Gives Better Rewards for Language Model Training ( https://arxiv.org/abs/2306.01693 )
Practical information
- General public
- Free