BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Enabling Efficient Communication in Large Heterogeneous Processors
DTSTART:20150310T161500
DTEND:20150310T173000
DTSTAMP:20260508T083550Z
UID:6b917cf62b1aeb7cce08acdc7e10f56a324801df48ec48b313473929
CATEGORIES:Conferences - Seminars
DESCRIPTION:Brad Beckmann\, member of AMD Research in Bellevue\, WA\, USA.
 \nGraphics processing units (GPUs) provide tremendous throughput with outs
 tanding performance-to-power ratios when executing data parallel code.  M
 eanwhile CPUs remain the best at executing sequential control code\, thus 
 most current designs integrate both types of devices into the same process
 or.  In order to allow programmers to fully leverage their diverse comput
 ational power\, these integrated CPU/GPU designs must be architected in a 
 cohesive and synergistic manner. In that vein\, our research builds upon t
 he recently published Heterogeneous System Architecture (HSA) specificatio
 n that provides (among other things) a system architecture where all devic
 es within a node (e.g.\, CPU\, GPU\, and other accelerators) share a singl
 e\, unified\, virtual memory space. This allows applications to be written
  where CPU and GPU code can freely exchange pointers without expensive mem
 ory transfers over PCIe\, marshalling of data structures\, nor complicated
  device-specific memory allocation.\nThis talk will discuss our research t
 hat enables efficient communications across large heterogeneous systems. I
 n particular\, I will describe a set of solutions that localize communicat
 ion and synchronization within an HSA-compatible heterogeneous processor.
   These solutions include a novel hardware mechanism\, called QuickReleas
 e\, that enables GPU memory systems to efficiently support fine-grain load
 -acquire/store-release synchronization between GPU threads without sacrifi
 cing throughput.  The solutions also include a set of memory consistency 
 models\, called Heterogeneous-Race-Free (HRF) memory models\, that provide
 s programmers with a well-defined framework to reason about large on-chip 
 memory systems. Finally I will introduce a new synchronization primitive\,
  called remote scope promotion\, that allows programmers to more frequentl
 y use lower latency localized synchronization\, rather than longer latency
  global synchronization.
LOCATION:BC 420 https://plan.epfl.ch/?room==BC%20420
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR