Scalable Data Analytics: The Role of Stratified Data Sharding

Thumbnail

Event details

Date 07.11.2017
Hour 11:0012:00
Speaker Professor Srinivasan Parthasarathy, The Ohio State University
Location
Category Conferences - Seminars

With the increasing popularity of structured data stores , social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the sharding, placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this eco-system according to the needs of the application to maximize locality, balance load, minimize data skew or even take into account energy consumption. Results on several real-world applications validate the efficacy and efficiency of our approach. (Notes: Joint work with Y. Wang (Airbnb) and A. Chakrabarti (MSR))

Practical information

  • Informed public
  • Free

Organizer

  • Professor Anastasia Ailamaki

Contact

  • Dimitra Tsaoussis-Melissargos

Tags

Data mining high performance computing big data

Event broadcasted in

Share