BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:DH Seminar Lecture - Network Inference from Textual Evidence: Info
 rmation Propagation\, Translation\, and Multi-Input Attention
DTSTART:20190219T160000
DTEND:20190219T170000
DTSTAMP:20260407T162131Z
UID:185d86ecf3bbb04d7ada7ce761404796487ff22ce90b48c6b6781d55
CATEGORIES:Conferences - Seminars
DESCRIPTION:Prof. David Smith \nAbstract\nMass digitization has provided a
  mountain of source material for the humanities and social sciences\, but 
 its structure is unevenly mapped. Dependencies among documents arise when 
 copying manuscripts\, citing scholarly literature\, speaking from talking 
 points\, reposting  social networking content\, popularizing scientific p
 apers\, or otherwise transforming earlier sources.  While some dependenci
 es are observable—e.g.\, by citations or links—we often need to infer 
 them from textual evidence. In our Viral Texts and Oceanic Exchanges proje
 cts\, we have built models to trace information flow within and across lan
 guages in poorly OCR'd newspapers. Other projects in our group infer and e
 xploit such dependencies to model the writing of legislation\, the impact 
 of scientific press releases\, and changes in the syntax of language.\n \
 nI discuss methods for inferring these dependency structures and exploitin
 g them to improve other tasks.  First\, I describe a directed spanning tr
 ee model of information cascades and a new unsupervised contrastive traini
 ng procedure that outperforms previous approaches to network inference.  
 I then describe extracting parallel passages from non-parallel multilingua
 l corpora by performing efficient search in the continuous document-topic 
 simplex of a polylingual topic model to train translation systems with gre
 ater accuracy than smaller clean datasets.  Finally\, I describe methods 
 for detecting multiple transcriptions of the same passage in a large corpu
 s of noisy OCR and for exploiting these multiple witnesses to correct nois
 y text.  These multi-input attention models provide efficient approximati
 ons to intractable multi-sequence alignment collation and enable 75% reduc
 tions in error with unsupervised models.\n 
LOCATION:BC 420 https://plan.epfl.ch/?room==BC%20420
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR