Less is More: Experiences in Deconstructing Hardware Coherence
Much of the complexity and overhead (directory, state bits, invalidations, broadcasts) of a typical hardware coherence implementation stems from the effort to make it “invisible” even to the strongest memory consistency models. In this talk, we show how a much simpler, directory-less/broadcast-less, multicore coherence can outperform a directory protocol without its complexity and overhead, with just a data-race-free guarantee from software. This simplification of coherence brings further simplifications to the entire on-chip memory system, for example unconstrained scaling to multiple buses, simple on-chip networks, and a new solution for efficient virtual-cache coherence. Significant area, energy, and performance benefits ensue as a result of simplifying the multicore memory organization, making our approach a prime candidate for coherent, shared-virtual-memory, heterogeneous architectures.