IC Monday Seminar: Inducing Meaning Representations from Text

Event details
Date | 07.05.2012 |
Hour | 16:15 › 17:30 |
Speaker | Dr. Ivan Titov, Saarland University - IC Faculty candidate |
Location | |
Category | Conferences - Seminars |
Abstract
Language understanding by machines is one of the principal objectives of artificial intelligence research. Though full understanding of unrestricted texts is still a remote goal, in recent years, statistical approaches have been developed to predict more shallow forms of semantics, such as underlying predicate-argument structure of sentences. Most existing statistical techniques for tackling these problems rely on large human-annotated datasets, which are expensive to create and exist only for a very limited number of languages. Even then, they are not very robust, cover only a small proportion of semantic constructions appearing in the labeled data, and are domain-dependent. We investigate Bayesian models which do not use any labeled data but induce semantic representations from unannotated texts. Unlike semantically-annotated data, unannotated texts are plentiful and available for many languages and many domains which makes our approach particularly promising. We show that these models induce linguistically-plausible semantic representations, significantly outperform current state-of-the-art approaches, and yield competitive results on question answering in the biomedical domain. We also look into several extensions of the model, and specifically consider multilingual induction of semantics, where we show that multilingual parallel texts provide a valuable source of indirect supervision for induction of shallow semantic representations.
Biography
Ivan Titov joined the Saarland University as a junior faculty and head of a research group in November 2009, following a postdoc at the University of Illinois at Urbana-Champaign. He received his Ph.D. in Computer Science from the University of Geneva in 2008 and his master's degree in Applied Mathematics and Informatics from the St. Petersburg State Polytechnic University (Russia) in 2003. His current research interests are in statistical natural language processing (models of syntax, semantics and sentiment) and machine learning (structured prediction methods, latent variable models, Bayesian methods).
Language understanding by machines is one of the principal objectives of artificial intelligence research. Though full understanding of unrestricted texts is still a remote goal, in recent years, statistical approaches have been developed to predict more shallow forms of semantics, such as underlying predicate-argument structure of sentences. Most existing statistical techniques for tackling these problems rely on large human-annotated datasets, which are expensive to create and exist only for a very limited number of languages. Even then, they are not very robust, cover only a small proportion of semantic constructions appearing in the labeled data, and are domain-dependent. We investigate Bayesian models which do not use any labeled data but induce semantic representations from unannotated texts. Unlike semantically-annotated data, unannotated texts are plentiful and available for many languages and many domains which makes our approach particularly promising. We show that these models induce linguistically-plausible semantic representations, significantly outperform current state-of-the-art approaches, and yield competitive results on question answering in the biomedical domain. We also look into several extensions of the model, and specifically consider multilingual induction of semantics, where we show that multilingual parallel texts provide a valuable source of indirect supervision for induction of shallow semantic representations.
Biography
Ivan Titov joined the Saarland University as a junior faculty and head of a research group in November 2009, following a postdoc at the University of Illinois at Urbana-Champaign. He received his Ph.D. in Computer Science from the University of Geneva in 2008 and his master's degree in Applied Mathematics and Informatics from the St. Petersburg State Polytechnic University (Russia) in 2003. His current research interests are in statistical natural language processing (models of syntax, semantics and sentiment) and machine learning (structured prediction methods, latent variable models, Bayesian methods).
Links
Practical information
- Informed public
- Free
- This event is internal
Organizer
- Christine Moscioni
Contact
- Christine Moscioni