Conferences - Seminars

08DEC
2017
Thumbnail
  Friday 8 December 2017 11:00 - 12:00 BC 420

What impact can Integrated Photonics have on data center architecture?

By Sébastien Rumley, Research Scientist in the Lightwave Research Laboratory, Columbia University, New York
Bio: Prior joining Columbia in 2012, he was at EPFL where he got his M.S and Ph.D degrees in communication systems. His research focuses on multilayer, cross-scale modeling and optimization of large scale interconnection networks. This includes analysis of nanophotonic devices and the integration thereof in next generation computing systems, network topology design and dimensioning, characterization of data-movement requirements and end-to-end evaluation of interconnect power consumption. He is also interested in novel, post-Moore’s Law computer architectures relying on photonic connectivity. Dr. Rumley is co-author of over 70 publications in the fields of optical interconnects and optical networks. He has served or is serving as Program Committee Member for the SuperComputing, ISC-HPC and NETWORKS conferences, as well as for various workshops (HiPINEB, Exacomm, HUCAA, AISTECS, OPTICS, HCPM, IA^3). He is a co-recipient of the best-student paper award of the 2016 SuperComputing edition.

Abstract: Big data analytics applications that rely on machine and deep learning techniques are seismically changing the landscape of datacenter architectures. Image or speech recognition tasks are now routinely executed by datacenters. These tasks, however, demand much more computing power than traditional ones: tens of GFLOP for recognizing one image, compared to tens of MFLOP for an SQL query. As a result, Graphics processing units (GPU) are literally invading datacenters, and will likely be followed by highly machine learning optimized hardware as Google’s Tensor Processing Unit (TPU). Yet the emergence of this novel computing hardware is only one facet of the ongoing data center transformation. Another transformation must occur in terms of interconnections. Hence, the performance of machine learning optimized chips is increasingly limited by off-chip communications. The concept of disaggregated datacenter, proposed by many actors (HPE’s The Machine/Moonshot, Intel RSD, Open Compute) as a way to use IT hardware more efficiently, is also bumping onto the high cost and power consumption of interconnects. In general, inter-component communications are increasingly acting as a bottleneck to datacenter performance.

Integrated photonics is (and has been) frequently evoked as a way to alleviate major bandwidth bottlenecks. And silicon photonics has been for years presented as the default path to low cost, low power photonics. In this talk, we will review the progresses realized in silicon photonics transceiver fabrication in the last years, and show how these transceivers can be closely integrated with conventional chips. The concept of Optically Connected Multi Chip Module (OC-MCM), composed of a silicon interposer with embedded optical connectivity, and carrying an high performance ASIC as a CPU, a GPU, an FPGA, or several memory dies, will be presented. The expected figures of merit in connectivity terms of such OC-MCMs will be summarized. We will then envisage how the datacenter architecture can be reorganized around discrete compute or memory building blocks, each block being assembled as one OC-MCM. Distance independence of optics allows in principle a CPU block to directly communicate with a memory block located several meters apart. Treating such a remote memory as a local one can be valuable when executing tasks with high memory capacity or bandwidth needs. A CPU block cannot be directly connected to every memory block, however, so the right trade-off in terms of connectivity must be identified. More generally, we will discuss the impact this OC-MCM approach can have on system design. Finally, we will see how the connections between OC-MCM can be reconfigured by means of optical switches. We will show that this reconfigurability can be exploited to adapt the hardware architecture to specific need, e.g. machine learning model training, but also point out some of the challenges it raises, in particular in terms of scheduling.

Organization Babak Falsafi

Contact Stéphanie Baillargues

Accessibility Informed public

Admittance Free