NLP Seminar: Improving Representations for Language Modeling

Event details

Date	05.11.2024
Hour	11:00 › 12:00
Speaker	Nathan Godey
Location	DIA 003 Online
Category	Conferences - Seminars
Event Language	English

Nathan Godey is a final-year PhD student at INRIA Paris, presenting his recent work on improving representations for language modeling.

Summary:
Generative models (e.g. Llama) have now mostly replaced traditional predictive models (e.g. BERT) for a variety of tasks, driving language systems to prioritize expansive generative capability over strong feature extraction. As a consequence, recent models can be used as black-box systems that only need to be dissected for explanation or interpretation purposes. In our work, we find that observing high-level characteristics of the representations these models produce can provide insights on the inherent limitations of the LLM paradigm, by exposing biases and distortions that emerge from both the nature of the training data and from the inductive biases used in model architectures.
Our work not only reveals key bottlenecks but also guides alternatives to standard modeling approaches, including a neural tokenization layer that enhances robustness, a contrastive LM objective that improves training efficiency, and paves the way for compression schemes aimed at more memory-efficient generative modeling. Overall, this presentation shows how representation analysis can shed light on fundamental modeling limitations while inspiring new approaches to overcome them.

Practical information

Informed public
Free

Organizer

Antoine Bosselut, NLP lab

Contact

Syrielle Montariol

Export Event

Event broadcasted in

Send a reminder