BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Scaling database systems to high-performance computers
DTSTART:20180423T140000
DTEND:20180423T150000
DTSTAMP:20260407T095809Z
UID:8b149c9ac6e7abeb270bb8360504c9a8dee11fcafd92219265fe5963
CATEGORIES:Conferences - Seminars
DESCRIPTION:Spyros Blanas\nProcessing massive datasets quickly requires wa
 rehouse-scale computers. Furthermore\, many massive datasets are multi-dim
 ensional arrays which are stored in formats like HDF5 and NetCDF that cann
 ot be directly queried using SQL. Parallel array database systems like Sci
 DB cannot scale in this environment that offers fast networking but very l
 imited I/O bandwidth to shared\, cold storage: merely loading multi-TB arr
 ay datasets in SciDB would take days--an unacceptably long time for many a
 pplications.\n\nIn this talk\, we will present ArrayBridge\, a common inte
 roperability layer for array file formats. ArrayBridge allows scientists t
 o use SciDB\, TensorFlow and HDF5-based code in the same file-centric anal
 ysis pipeline without converting between file formats. Under the hood\, Ar
 rayBridge manages I/O to leverage the massive concurrency of warehouse-sca
 le parallel file systems without modifying the HDF5 API and breaking backw
 ards compatibility with legacy applications. Once the data has been loaded
  in memory\, the bottleneck in many array-centric queries becomes the spee
 d of data repartitioning between different nodes. We will present an RDMA-
 aware data shuffling abstraction that directly converses with the network 
 adapter in InfiniBand verbs and can repartition data up to 4X faster than 
 MPI. We conclude by highlighting research opportunities that need to be ov
 ercome for data processing to scale to warehouse-scale computers.\n\n 
LOCATION:BC 410 https://plan.epfl.ch/?room==BC%20410
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
