BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Practical and Efficient Near-Data Processing for In-Memory Analyti
 cs
DTSTART:20160310T133000
DTEND:20160310T150000
DTSTAMP:20260406T222211Z
UID:04eb9ac585688694a704d4cb75512c7954b244a2b945cfd289193182
CATEGORIES:Conferences - Seminars
DESCRIPTION:Mingyu Gao\, Ph.D. candidate in the Department of Electrical E
 ngineering\, Stanford University\nBio :\nMingyu Gao now is a Ph.D. candida
 te in the Department of Electrical Engineering\, Stanford University. His 
 research interest is computer architecture and system. Currently he is wor
 king with Professor Christos Kozyrakis in Multi-scale Architecture & Syste
 ms Team (MAST)\, inverstigating energy-efficient memory systems and accele
 rators for analytics applications and datacenter services. Specifically\, 
 his work focuses on efficient and practical near-data processing for DRAM-
 based memory systems\, low-power high-density reconfigurable acceleration 
 fabrics\, and system integration of non-volatile memory technologies. Ming
 yu received his Master of Science degree in Electrical Engineering in Stan
 ford University in June\, 2014. Before coming to Stanford\, He got Bachelo
 r of Science degree in Microelectronics in Tsinghua University\, Beijing\,
  China\, in June\, 2012.\nAbstract :\nThe end of Dennard scaling has made 
 all systems energy-constrained. For data-intensive applications with limit
 ed temporal locality\, the best way to optimize energy is to place process
 ing near the data in main memory. In this talk\, we develop the hardware a
 nd software support for a practical NDP architecture based on 3D integrati
 on. First\, focusing on general-purpose cores\, we develop simple but scal
 able hardware support for coherence\, communication\, and synchronization\
 , and a runtime system that is sufficient to support analytics\, graph pro
 cessing\, and deep neural networks frameworks with complex data patterns w
 hile hiding all the details of the NDP hardware. We also investigate the b
 alance between processing and memory throughput\, the scalability\, and th
 e importance of software optimization for spatial locality. This NDP archi
 tecture provides up to 16x performance and energy advantage over conventio
 nal approaches\, and 2.5x over recently-proposed NDP systems. Next\, we fo
 cus on the processing elements in the NDP stack. Processing elements based
  on reconfigurable logic have been proposed as a compromise between the ef
 ficiency of custom engines and the flexibility of programmable cores. Unfo
 rtunately\, conventional FPGAs and CGRAs incur significant area and power 
 overheads respectively. We develop Heterogeneous Reconfigurable Logic (HRL
 )\, a reconfigurable array for NDP systems that combines coarse-grained an
 d fine-grained logic blocks and separates routing networks for data and co
 ntrol signals. HRL has the power efficiency of FPGA and the area efficienc
 y of CGRA. It improves performance per Watt by 2.2x over FPGA and 1.7x ove
 r CGRA\, and achieves 92% of the peak performance of an NDP system based o
 n custom accelerators.\nRefreshments will be available before the talk as 
 from 1:15pm.
LOCATION:BC 420 https://plan.epfl.ch/?room==BC%20420
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR