A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

Event details
Date | 21.03.2011 |
Hour | 11:15 |
Speaker | Prof. Steven M. Nowick, Columbia University |
Location |
ELA 2
|
Category | Conferences - Seminars |
There has been a resurgence of interest in asynchronous (i.e. clockless) digital design in recent years, as designers confront formidable challenges of high-speed clock distribution, chip complexity, power, design time, mixed-timing domains and reusability.
This talk is in two parts. In the first part, I will give a brief overview of asynchronous design, including motivation and highlights of recent industry activity (Intel, Boeing, Sun, Philips) and academia. I will also briefly survey some of my active research areas: (i) CAD tools for asynchronous circuits and systems, (ii) mixed-timing interfaces, and (iii) low-power delay-insensitive global communication.
In the second part, I present a new asynchronous interconnection network for globally-asynchronous locally-synchronous (GALS) chip multiprocessors. The network eliminates the need for global clock distribution, and can interface multiple synchronous timing domains operating at unrelated clock rates. In particular, two new highly-concurrent asynchronous components are introduced which provide simple routing and arbitration/merge functions. Post-layout simulations indicate that comparable recent synchronous router nodes, based on a latency-insensitive design style, have 5.6-10.7x more energy per packet and 2.8-6.4x greater area than the new asynchronous nodes. Pre-layout system-level network simulations, using post-layout nodes, are then performed for the asynchronous network and a fabricated synchronous network (800 MHz, 1.36 GHz) in identical commercial 90nm technology. Under random traffic, the new network provides significantly lower latency and competitive throughput over the entire operating range of the the 800 MHz network and through mid-range traffic rates for the 1.36 GHz network, but with degradation at higher traffic rates. Low end-to-end latencies, through 6 router nodes and 5 hops, of 5.2ns (at moderate load) were observed.
Simulations are also presented for a GALS network, running both
random traffic and several parallel benchmark kernels, the latter co-simulated with a shared-memory parallel CMP architecture, as well as directions for further improvement.
Practical information
- General public
- Free
Contact
- Prof. Giovanni De Micheli