PhD Defense: "Signal Coding Approaches for Spatial Audio"

Sina Zamani

March 12th (Tuesday), 3:00pm
Engineering Science Building (ESB), Room 2001

Advances in virtual reality have generated substantial interest in accurately reproducing and storing spatial audio in the higher order ambisonics (HOA) format. Recent standardization for HOA compression, the MPEG-H 3D Audio, applies singular value decomposition (SVD) to the input HOA data, then encodes each predominant sound component independently using a standard core audio codec. The residual signal is encoded in the ambisonic domain. The noted shortcomings of this approach are: (i) the occasional mismatch in principal components across blocks, and the resulting suboptimal transitions in the data fed to the audio coder; (ii) encoding only few predominant components due to the prohibitive side information cost of specifying SVD basis vectors; (iii) ignoring spatial inter channel masking effects, and (iv) the difficulty in perceptual optimization of the encoding parameters in both SVD and ambisonic domains.

In this talk, I will address the mentioned shortcomings and present my research work on developing several alternative encoding architectures for compression of HOA data. The proposed frameworks employ frequency domain SVD, which ensures smooth transition between frames and enables SVD adaptation to frequency. I will also discuss several modifications to the basis vector estimation framework for optimizing the Rate-Distortion performance by balancing the trade-off between adaptivity and side information cost in different applications. I will present subjective and objective evaluation results on the effectiveness of the proposed approach, that illustrate significant performance improvements, in terms of both compression gain and perceptual quality.

About Sina Zamani:

photo of sina zamaniSina is a PhD candidate in the Department of Electrical and Computer Engineering at the University of California, Santa Barbara. He received his M.S. degree in Electrical and Computer Engineering from UCSB in 2015, and his B.S. degree in Electrical Engineering from Sharif University of Technology, Tehran, Iran in 2013. His research interest includes spatial audio, audio and speech coding.

Hosted by: Professor Kenneth Rose