DB Seminar: Platforms and Applications for “Big and Fast” Data Analytics

Thumbnail

Event details

Date 07.11.2014
Hour 12:3013:30
Speaker Prof. Yanlei Diao http://people.cs.umass.edu/~yanlei/
Location
Category Conferences - Seminars
Recently there has been a significant interest in building big data systems that can handle not only “big data” but also “fast data” for analytics. Our work is strongly motivated by recent real-world case studies that point to the need for a general, unified data processing framework to support analytical queries with different latency requirements. Towards this goal, our project is designed to transform the popular MapReduce computation model, originally proposed for batch processing, into distributed (near) real-time processing.
In this talk, I start by examining the widely used Hadoop system and presenting a thorough analysis to understand the causes of high latency in Hadoop. I then present a number of necessary architectural changes, as well as new resource configuration and optimization techniques to meet user-specified latency requirements while maximizing throughput. Experiments using typical workloads in click stream analysis and twitter feed analysis show that our techniques reduce the latency from tens or hundreds of seconds in Hadoop to sub-second in our system, with 2x-7x increase in throughput. Our system also outperforms state-of-the-art distributed stream systems, Twitter Storm and Spark Streaming, by a wide margin. Finally, I will show some initial results and challenges of supporting big and fast data analytics in the emerging domain of genomics.

Practical information

  • Informed public
  • Free

Organizer

  • Prof. Anastasia Ailamaki
    Prof. Christoph Koch

Contact

  • Dimitra Tsaoussis

Tags

big data database

Event broadcasted in

Share