Events

PhD Defense: "Low Delay, Low Complexity Multimode Tree Coding and Practical Rate Distortion Bounds for Speech"

Ying-Yi Li

December 13th (Thursday), 1:00pm
Harold Frank Hall, Rm 4164 (ECE Conference Room)


A low-delay and low-complexity Multimode Tree coder with perceptual pre- and post-weighting and backward pitch prediction for both narrowband and wideband speech is developed. In addition, we develop composite source models for both narrowband and wideband speech, and apply these to classical rate distortion theory. Since classical rate distortion theory is based on MSE distortion, we generate mapping functions by calculating the MSE and PESQ/WPESQ pairs from ADPCM coders. As a result, the performance of a standardized speech codec can be compared with rate distortion bounds based on PESQ/WPESQ distortion.

In our experiments, the results show that perceptual pre- and post-weighting filters and backward pitch prediction does improve speech quality without increasing bit rate and delay for voiced speech. Compared with narrowband speech codecs, the worst-case complexity of the Multimode Tree coder is one-third of AMR-NB and one-eighth of G.728, and the delay of the Multimode Tree coder is a quarter of AMR-NB. Compared with wideband standardized speech codecs, the worst-case computational complexity of the Multimode Tree coder is one-third of AMR-WB and the delay of the Multimode Tree coder is half of AMR-WB and one-third of G.722.1.

In addition, composite source models for both narrowband and wideband speech are developed. In order to generate the mapping function for MSE and PESQ/WPESQ, we use G.726/G.727 for narrowband speech mapping and generate a wideband ADPCM coder based on G.726 and G.727 for wideband mapping. The rate distortion bounds calculated from composite source models based on MSE distortion are mapped to PESQ/WPESQ distortion by mapping functions. Therefore, the performance of standardized speech codecs is compared with the rate distortion bounds based on PESQ/WPESQ distortion.

About Ying-Yi Li:

photo of ying-yi li Ying-Yi Li received her B.S. and M.S. in Computer Science from National Tsing Hua University, Hsinchu, Taiwan in 2005 and 2007, respectively. Ying-Yi is currently pursuing her Ph.D. in Prof. Jerry Gibson's VivoNets Lab. Her research interests lie in voice communications focusing on low-delay and low-complexity speech coding.

Hosted by: Professor Jerry Gibson