Using structured knowledge for solving multilingual NLP tasks

Thumbnail

Event details

Date 22.07.2021
Hour 13:0015:00
Speaker Marija Sakota
Category Conferences - Seminars
EDIC candidacy exam
exam president: Prof. Tanja Käser
thesis advisor: Prof. Robert West
co-examiner: Prof. Boi Faltings

Abstract
In recent years, big language models pre-trained in unsupervised manner have become increasingly popular. However, most of them focus solely on the English language or monolingual models in other popular languages. Generating text in low-resource languages is still a challenging task. Alternative to monolingual are multilingual models that can handle multiple languages in a single model and, possibly, exploit knowledge from high-resource languages to improve performance on low-resource ones. Although pre-training can enable a model to learn some facts and relations, it still fails to rigidly enforce their usage. The solution for that is to include structured knowledge in these models through, for example,  the usage of knowledge graphs. Integrating text and knowledge graph information turns out to be a non-trivial problem, mostly because of the structural difference between them. In this proposal, first, a new method for pre-training of complete sequence-to-sequence models by denoising text in multiple languages is introduced. Then, a method for including information from knowledge graphs through a new encoder is presented. Next, a new form of extreme summarization task for scientific articles and a method to solve it are showcased. Finally, possible research directions for multilingual models that use structured data are discussed.


Background papers
1) Multilingual Denoising Pre-training for Neural Machine Translation, Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li , Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer: https://www.aclweb.org/anthology/2020.tacl-1.47.pdf
2) Text Generation from Knowledge Graphs with Graph Transformers, Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, and Hannaneh Hajishirzi: https://www.aclweb.org/anthology/N19-1238.pdf
3) TLDR:Extreme Summarization of Scientific Documents, Isabel Cachola, Kyle Lo, Arman Cohan, Daniel S. Weld: https://www.aclweb.org/anthology/2020.findings-emnlp.428.pdf

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share