ECE Seminar Series – Nov 3 (Fri) @ 2:00pm: “NVIDIA NeMo Toolkit for Conversational AI: An Open Source Framework for Advanced Speech Models,”Jagadeesh Balam, Senior Research Manager & Taejin Park, Senior Research Scientist, NVIDIA

Date and Time
Location
Engineering Science Bldg (ESB), Room 1001
photo of balam and park

Come at 1:15pm for Cookies, Coffee and Conversation!
DISTINGUISHED LECTURE at the ECE SEMINAR SERIES

Abstract

Join us as we explore NVIDIA NeMo, an open-source toolkit tailored for the training and inference of cutting-edge speech models. This talk will highlight the latest trends in speech research. In particular, we will focus on the multi-speaker Automatic Speech Recognition (ASR) with NeMo, a system designed to produce transcriptions with individual speaker labels using a combination of speaker diarization and ASR models. Discover the intricacies of data simulation, the training procedure for speaker diarization, and multi-speaker ASR, and understand how large language models (LLMs) are enhancing multi-speaker ASR capabilities. Lastly, a glimpse into the future of NVIDIA's NeMo will be presented, emphasizing the expanding role of LLMs in advancing multi-speaker ASR and conversational AI in general.

Bios

Jagadeesh Balam earned his M.S. and Ph.D. in Electrical and Computer Engineering from University of California, Santa Barbara in 2007.He pursued his Ph.D. under the supervision of Prof. Jerry Gibson. He is now a Senior Research Manager in the NeMo Speech AI team where he leads the Multi-Speaker ASR and Speech Enhancement research.

Taejin Park earned his B.S. in Electrical Engineering in 2010 and M.S. in Electric Engineering and Computer Science in 2012 from Seoul National University. He later obtained a Ph.D. in Electrical Engineering and an M.S. in Computer Science from USC. Dr. Park is now a Senior Research Scientist at NVIDIA, specializing in machine learning and speech signal processing.

Hosted by: The ECE Seminar Series

Submitted by: Haewon Jeong <haewon@ucsb.edu>