Feb 28 (Mon) @ 10:00am: "Building Reliable ML Systems: from machines to humans," Haewon Jeong, Postdoc Fellow, Harvard

Date and Time
Zoom Meeting -



Machine learning (ML) algorithms are increasingly deployed in applications that have high-stake impacts on people (e.g., education, hiring), and it is crucial to ensure the reliability of such ML systems. My research considers reliability in two categories: reliability in machine metrics (e.g., computation time, accuracy) and reliability in human metrics (e.g., fairness, accountability). In this talk, I will discuss recent results that address reliability challenges in both metrics and exciting future directions towards building end-to-end reliability, from machines to humans.

In the first part of the talk, I will discuss my work towards building reliability in machine metrics, especially when we train a model on large-scale distributed systems (e.g., high-performance computing (HPC) clusters). When we utilize thousands of distributed nodes for computing, there can be slow nodes, unresponsive nodes, or bit flips that can degrade the overall computation time and accuracy of the output. I will introduce coded computing, which is an emerging subarea of Information Theory that combines traditional coding theory and distributed ML algorithms. One of the foundational building blocks of large-scale ML computation is distributed matrix multiplication. I developed MatDot codes that meet the fundamental limit and solve the longstanding intellectual problem of computing matrix multiplication reliably. In collaboration with the Oak Ridge National Lab, we implemented this theoretical breakthrough into a practical parallel matrix multiplication algorithm, 3D SUMMA, bringing coded computing closer to HPC practitioners.

In the second part, I will delve into improving reliability in human metrics. I ask a novel question that has significant practical impacts: how do we learn a fair model when data has missing values? Even though there are numerous fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g., gender or race). We derive a fundamental limit that shows that simply applying fair ML methods after data imputation is insufficient and there is no universally fair imputation method for different downstream learning tasks. Guided by this theory, I propose a decision-tree-based approach that can learn a fair model by incorporating data imputation and learning into a single algorithm. I apply the algorithm on education datasets and show that it outperforms state-of-the-art fair ML methods. Further, I discuss the downstream effects of fair decisions in the context of education. Finally, I will tie these together and present my research vision for building reliable machine learning systems.


Haewon Jeong is a postdoctoral fellow at Harvard John A. Paulson School Of Engineering And Applied Sciences. She received the B.S. degree from KAIST in 2014, and the Ph.D. degree from Carnegie Mellon University in 2020. Her research centers around building reliable machine learning systems. Her thesis connects the classical theory of error-correcting codes and distributed computing, and proposes reliable large-scale computing strategies, including her recent work on an optimal matrix multiplication scheme and its application to practical machine learning algorithms. Her current research interests include understanding reliability in human metrics (e.g., fairness, accountability). She actively collaborates with social scientists to develop fair machine learning algorithms, especially in application to education. She won the 2014 NSDI Community Award, the 2014 Samsung HumanTech Paper Award, the gold prize for Graduation Day talks at the 2020 Workshop on Information Theory and its Applications, and the 2021 Harvard Data Science Postdoc Fellowship.

Hosted by: ECE Computer Engineering

Submitted by: Libby Straight <libby@ece.ucsb.edu>