BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:IC Colloquium: Fast and Effective Analytics for Big Multi-Dimensio
 nal Data
DTSTART:20220407T100000
DTEND:20220407T110000
DTSTAMP:20260406T073856Z
UID:50eac6456f96914252aa94f6558012a19a41a1c7ed4ffa058649f890
CATEGORIES:Conferences - Seminars
DESCRIPTION:By: John Paparrizos - University of Chicago    \nIC Facult
 y candidate\n\nAbstract\nToday\, automated processes\, Internet‑of‑Thi
 ngs deployments\, and Web and mobile applications generate an overwhelming
  amount of high‑dimensional data. Meanwhile\, computational resources re
 main limited\, and advances in machine learning (ML) create a pressing nee
 d to support increasingly expensive and complex analytical tasks. Unfortun
 ately\, traditional data management techniques offer limited support for h
 igh‑dimensional data\, ML tasks\, and adaptation to data properties\, of
 ten resulting in reduced performance. Similarly\, due to the difficulty of
  providing invariances to specific data distortions\, applications often r
 esort to inadequate ML methods\, reducing their effectiveness.\n\nIn my wo
 rk\, I ask how we can address the lack of task‑aware and data‑driven a
 daptations in data management and ML methods. Specifically\, I will discus
 s two solutions for (i) data representations and (ii) computational method
 s using techniques to exploit similarities\, shapes\, densities\, and dist
 ributions in data. Motivated by the ubiquity of high-dimensional time seri
 es\, I will first present a method for anomaly detection in streaming data
  to account for distribution drifts. Then\, I will discuss a variance-awar
 e quantization method for indexing high-dimensional data that enables simi
 larity search queries at scale. In both examples\, the proposed methods su
 bstantially improve performance and accuracy\, demonstrating the benefit o
 f designing task-aware and data-driven solutions for large-scale data scie
 nce applications.\n\nBio\nJohn Paparrizos is a postdoctoral researcher at 
 the University of Chicago. He works in the area of advanced database syste
 ms with a focus on enabling complex analytics for high-dimensional data\, 
 supporting the next generation of data-intensive and machine learning appl
 ications. John completed his Ph.D. at Columbia University and earned his M
 .S. from EPFL. His research has received multiple distinctions\, including
  a "Best of SIGMOD" selection\, an ACM SIGMOD Research Highlight Award\, a
  recognition of his Ph.D. thesis at the ACM SIGKDD Dissertation Award comp
 etition\, and a NetApp Faculty Award. His ideas have been adopted in vario
 us domains\, including energy\, medicine\, biology\, neuroscience\, and or
 ganizations\, including Fortune 100 companies and the European Space Agenc
 y. Several media outlets have covered his research\, including The New Yor
 k Times\, Washington Post\, Guardian\, and MIT Technology Review.\n\nMore 
 information
LOCATION:BC 420 https://plan.epfl.ch/?room==BC%20420 https://epfl.zoom.us/
 j/65544071377?pwd=Z0taeENpUnR0dGZyOG0zZGc3cHA2dz09
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
