Towards Multimodal Technologies for the World
There has been an explosive growth of vision-and-language architectures in the last few years, which are usually trained on English captions paired with images from North America or Western Europe.
In this talk, Emanuele will first introduce a new protocol to collect culturally relevant images and captions, which resulted in MaRVL, a multimodal reasoning dataset in five diverse languages. He will then discuss limitations of state-of-the-art models when evaluated on multilingual data, made possible by the IGLUE benchmark.
Finally, he will show that we can substantially improve zero-shot cross-lingual transfer by compromising our ideals of multilingual multimodal data.
Emanuele Bugliarello received his MSc from the IC School at EPFL in 2018, and he iscurrently a final-year PhD Fellow in the NLP Section at the University of Copenhagen. His research lies at the intersection of language and vision, with a particular interest in building models and creating resources that
represent the diversity of cultural and linguistic backgrounds.
Practical information
- General public
- Free