BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Feedback-based alignment of large language models
DTSTART:20230907T100000
DTEND:20230907T120000
DTSTAMP:20260407T112437Z
UID:3d96abb4a13b0452d8d02ea95f81f1231d16aec96c2a4c44202699c1
CATEGORIES:Conferences - Seminars
DESCRIPTION:Beatriz Borges Ribeiro\nEDIC candidacy exam\nExam president: P
 rof. Tanja Käser\nThesis advisor: Prof. Antoine Bosselut\nCo-examiner: Pr
 of. Robert West\n\nAbstract\nFeedback has become an increasingly popular a
 venue for aligning Large Language Models (LLMs) to human values. Out of ma
 ny different possible formulations of feedback\, Natural Language Feedback
  (NLF) stands out as the richest and most diverse in terms of the informat
 ion it can convey. However\, exploration on what makes NLF effective remai
 ns an open question\, as current approaches are typically hand-designed an
 d arbitrary. My work will seek to first survey the most impactful feedback
  models in pedagogy for human learning. Then\, with this grounding\, I wil
 l propose novel approaches and systems for feedback and model alignment\, 
 aiming to leverage higher granularities than are currently used.\n\nBackgr
 ound papers\n\n	Proximal Policy Optimization Algorithms ( https://arxiv.o
 rg/abs/1707.06347 )\n	Training language models to follow instructions wit
 h human feedback ( https://arxiv.org/abs/2203.02155 )\n	Fine-Grained Hum
 an Feedback Gives Better Rewards for Language Model Training ( https://ar
 xiv.org/abs/2306.01693 )\n
LOCATION:BC 133 https://plan.epfl.ch/?room==BC%20133
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
