DH Seminar Lecture - Network Inference from Textual Evidence: Information Propagation, Translation, and Multi-Input Attention

Thumbnail

Event details

Date 19.02.2019
Hour 16:0017:00
Speaker Prof. David Smith
Location
Category Conferences - Seminars
Abstract
Mass digitization has provided a mountain of source material for the humanities and social sciences, but its structure is unevenly mapped. Dependencies among documents arise when copying manuscripts, citing scholarly literature, speaking from talking points, reposting  social networking content, popularizing scientific papers, or otherwise transforming earlier sources.  While some dependencies are observable—e.g., by citations or links—we often need to infer them from textual evidence. In our Viral Texts and Oceanic Exchanges projects, we have built models to trace information flow within and across languages in poorly OCR'd newspapers. Other projects in our group infer and exploit such dependencies to model the writing of legislation, the impact of scientific press releases, and changes in the syntax of language.
 
I discuss methods for inferring these dependency structures and exploiting them to improve other tasks.  First, I describe a directed spanning tree model of information cascades and a new unsupervised contrastive training procedure that outperforms previous approaches to network inference.  I then describe extracting parallel passages from non-parallel multilingual corpora by performing efficient search in the continuous document-topic simplex of a polylingual topic model to train translation systems with greater accuracy than smaller clean datasets.  Finally, I describe methods for detecting multiple transcriptions of the same passage in a large corpus of noisy OCR and for exploiting these multiple witnesses to correct noisy text.  These multi-input attention models provide efficient approximations to intractable multi-sequence alignment collation and enable 75% reductions in error with unsupervised models.
 

Practical information

  • General public
  • Free

Organizer

  • DHI

Share