PhD Defense: "Yield, Cost, Reliability, and Availability of Multi-Core System-on-Chips"

Saeed Shamshiri

May 6th (Friday), 4:00pm
Harold Frank Hall (HFH), Rm 4164

Aggressive technology scaling has magnified the reliability challenges as it increases the number of permanent and transient faults due to the accelerated aging, increased device variations, and significant noise margin reduction. In this thesis, we address the key challenges of the yield and reliability of NoC-based SoCs which in-clude cores, on-chip communications, and on-chip memories. Our yield and cost analysis shows that by adding a limited number of spare cores and wires to replace defective cores and wires either before shipment or in the field, the effective yield, in-field availability, and overall cost of the system can be significantly improved and the burn-in process can be eliminated. We also propose a quality metric for on-chip communication which can be used along with the frequency binning to price the chip in the market. We demonstrate that the overall quality of a mesh-based NoC depends more on the reliability of the inner links, and hence, non-uniform spare wire distribution is more effective than a uniform approach.

For the reliability of the on-chip memories, we propose error-locality-aware codes to correct single-bit or multi-bit upsets as well as physical defects in SRAM cells. With the same cost as Golay and BCH codes, our proposed codes provide better reliability against multi-bit upsets. We propose an interleaved error-locality-aware code to be used for end-to-end error correction in on-chip communications. In order to maintain the error correction capability of the code for transient and intermittent errors, we further propose an end-to-end data gathering and online diagnosis approach that locates the defective wires and replaces them with the spare wires embedded in the network.

About Saeed Shamshiri:

Saeed Shamshiri received his Bachelor and Master degrees from the University of Tehran, Iran. He is currently a Ph.D. candidate in the University of California, Santa Barbara. His current research lies in the area of yield, cost, reliability and availability of multi-core SoCs. He has published over 20 papers in the field of computer architecture and hardware.

Hosted by: Professor Kwang-Ting (Tim) Cheng