DB Seminar: Platforms and Applications for “Big and Fast” Data Analytics

Event details

Date	07.11.2014
Hour	12:30 › 13:30
Speaker	Prof. Yanlei Diao http://people.cs.umass.edu/~yanlei/
Location	INM10
Category	Conferences - Seminars

Recently there has been a significant interest in building big data systems that can handle not only “big data” but also “fast data” for analytics. Our work is strongly motivated by recent real-world case studies that point to the need for a general, unified data processing framework to support analytical queries with different latency requirements. Towards this goal, our project is designed to transform the popular MapReduce computation model, originally proposed for batch processing, into distributed (near) real-time processing.
In this talk, I start by examining the widely used Hadoop system and presenting a thorough analysis to understand the causes of high latency in Hadoop. I then present a number of necessary architectural changes, as well as new resource configuration and optimization techniques to meet user-specified latency requirements while maximizing throughput. Experiments using typical workloads in click stream analysis and twitter feed analysis show that our techniques reduce the latency from tens or hundreds of seconds in Hadoop to sub-second in our system, with 2x-7x increase in throughput. Our system also outperforms state-of-the-art distributed stream systems, Twitter Storm and Spark Streaming, by a wide margin. Finally, I will show some initial results and challenges of supporting big and fast data analytics in the emerging domain of genomics.

Practical information

Informed public
Free

Organizer

Prof. Anastasia Ailamaki
Prof. Christoph Koch

Contact

Dimitra Tsaoussis

Export Event

Event broadcasted in

Send a reminder