Towards novel evaluation methods for neural dialog systems
Event details
Date | 09.07.2019 |
Hour | 14:00 › 16:00 |
Speaker | Ekaterina Svikhnushina |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Dr. Martin Rajman
Thesis advisor: Dr. Pearl Pu Faltings
Co-examiner: Prof. Robert West
Abstract
Recent success of sequence-to-sequence neural networks has inspired intensive research on human-like dialog-generation task. But evaluation of response-generation models remains an impeding factor: a reliable automatic metric is unavailable while human experiments are expensive. As a result, establishing a decent evaluation metric for open-domain dialog systems is still an open research problem, which we aim to address in our thesis. In this proposal, we first introduce the context of neural-based dialog generation. Then we examine why evaluation metrics from other natural language processing domains are inapplicable for this task. Finally, we discuss strengths and weaknesses of a recently proposed automatic evaluation metric.
Background papers
A Neural Conversational Model. (2015), by O. Vinyals and Q. Le.
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. (2016), by C. Liu, R. Lowe, et al.
Ruber: An unsupervised method for automatic evaluation of open-domain dialog systems. (2018). by C. Tao et al.
Exam president: Dr. Martin Rajman
Thesis advisor: Dr. Pearl Pu Faltings
Co-examiner: Prof. Robert West
Abstract
Recent success of sequence-to-sequence neural networks has inspired intensive research on human-like dialog-generation task. But evaluation of response-generation models remains an impeding factor: a reliable automatic metric is unavailable while human experiments are expensive. As a result, establishing a decent evaluation metric for open-domain dialog systems is still an open research problem, which we aim to address in our thesis. In this proposal, we first introduce the context of neural-based dialog generation. Then we examine why evaluation metrics from other natural language processing domains are inapplicable for this task. Finally, we discuss strengths and weaknesses of a recently proposed automatic evaluation metric.
Background papers
A Neural Conversational Model. (2015), by O. Vinyals and Q. Le.
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. (2016), by C. Liu, R. Lowe, et al.
Ruber: An unsupervised method for automatic evaluation of open-domain dialog systems. (2018). by C. Tao et al.
Practical information
- General public
- Free