Feb 16 (Wed) @ 10:00am: "Sparse Matrices and High-Performance Computing Meet Biology," Giulia Guidi, PhD Candidate, UC Berkeley
Genomics and many other scientific disciplines are facing exponential growth of data coming from improved and less expensive instrumentation, overwhelming conventional analytical infrastructures. As a result, scalable parallel systems are being used, whether traditional high-performance computing (HPC) systems or the cloud, but programming remains a challenge limited to parallel computing experts.
In this context, we have developed a novel set of genomics algorithms for de novo genome assembly (i.e., reconstruction of an unknown genome from redundant, erroneous genomic sequences) that are integrated into the diBELLA 2D software package and based on sparse matrix multiplication supporting general semiring abstraction. This enables the creation and easy modification of powerful genomics pipelines that take advantage of massively parallel hardware without exposing low-level architecture. diBELLA 2D is up to 2x faster on 100s nodes than a 1D algorithm based on distributed hash tables, which are more difficult to parallelize. diBELLA 2D integrates GPU support in the most compute-intensive stages of the pipeline to take advantage of today's heterogeneous HPC hardware.
To ensure that the genomics research community and others, in general, can benefit from HPC, the development of distributed algorithms such as diBELLA 2D must be coupled with efforts to make distributed computing more accessible, as traditional HPC systems are typically allocated to specific research communities and have long user wait times, limiting access to resources and scientific discovery. To this end, we have shown that we are on the cusp of a paradigm shift in high-performance computing (HPC) away from purely institutional or agency-wide HPC systems to cloud computing, as the latter has made significant advances in networking technology and HPC system software.
Giulia Guidi is a Ph.D. candidate in Computer Science at UC Berkeley, advised by Aydın Buluç and Kathy Yelick. Giulia is a 2020 SIGHPC Computational & Data Science Fellow. Her work is in the area of computer systems research, including cloud and parallel computing, and she is interested in building a collaborative interdisciplinary research program. Giulia works on the challenges of large-scale computational biology and the algorithms and software infrastructures that meet the usability and performance demand of this community, as well as how to make cloud computing more accessible for high-performance scientific computing. Giulia's research goal is to make writing high-performance scientific code as easy as writing high-performance deep learning code through the use of powerful abstraction. Giulia is generally interested in the intersection of High-Performance Computing (HPC), Computer Systems, and Computational Biology as enabling technologies for faster, higher-quality scientific discovery.
Hosted by: ECE Computer Engineering
Submitted by: Libby Straight <email@example.com>