"Scaling Software and Hardware for Thousand-core Systems"

Daniel Sanchez, Stanford University

February 16th (Thursday), 10:00am
Computer Science Conference Room (HFH 1132)

Scaling multicores to thousands of cores efficiently requires significant innovation across the software-hardware stack. On one hand, to expose ample parallelism, many applications will need to be divided in fine-grain tasks of a few thousand instructions each, and scheduled dynamically in a manner that addresses the three major difficulties of fine-grain parallelism: locality, load imbalance, and excessive overheads. On the other hand, hardware resources must scale efficiently, even as some of them are shared among thousands of threads. In particular, the memory hierarchy is hard to scale in several ways: conventional cache coherence techniques are prohibitively expensive beyond a few tens of cores, and caches cannot be easily shared among multiple threads or processes. Ideally, software should be able to configure these shared resources to provide good overall performance and quality of service (QoS) guarantees under all possible sharing scenarios.

In this talk, I will present several techniques to scale both software and hardware. First, I will describe a scheduler that uses high-level information from the programming model about parallelism, locality, and heterogeneity to perform scheduling dynamically and at fine granularity to avoid load imbalance. This fine-grain scheduler can use lightweight, flexible hardware support to keep overheads small as we scale up. Second, I will present a set of techniques that, together, enable scalable memory hierarchies that can be shared efficiently: ZCache, a cache design that achieves high associativity cheaply (e.g., 64-way associativity with the latency, energy and area of a 4-way cache) and is characterized by simple and accurate analytical models; Vantage, a cache partitioning technique that leverages the analytical guarantees of ZCache to implement scalable and efficient partitioning, enabling hundreds of threads to share the cache in a controlled manner, providing configurability and isolation; and SCD, which leverages ZCache to implement scalable cache coherence with QoS guarantees.

About Daniel Sanchez:

Daniel Sanchez is a PhD candidate in the Electrical Engineering Department at Stanford University. His research focuses on large-scale multicores, specifically on scalable and dynamic fine-grain runtimes and schedulers, hardware support for scheduling, scalable and efficient memory hierarchies, and architectures with QoS guarantees. He has earned an MS in Electrical Engineering from Stanford, and a BS in Telecommunication Engineering from the Technical University of Madrid (UPM).

Hosted by: Computer Engineering / HP Labs Colloquium Series and ECE Chair, Jerry Gibson