Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement
-
Graphical Abstract
-
Abstract
A mask estimation method for adaptive beamforming for spherical microphone arrays is proposed which at first extracts the low-dimensional spatial vector containing spatial information from the spherical harmonic coefficients of the received signals,and then employs a Complex Gaussian Mixture Model(CGMM) or a deep learning network to estimate the mask.Finally,the estimate mask is used to design the Minimum Variance Distortionless Response(MVDR) beamformer,so that the directional interferences can be suppressed.The simulation results show that the computational complexity of the proposed method is one-level magnitude lower than the conventional method processing in microphone domain,and the corresponding MVDR beamformer can achieve much better performance in terms of Perceptual Evaluation of Speech Quality(PESQ),segmental Signal-to-Noise Ratio(segSNR),and Short-Time Objective Intelligibility(STOI) in most acoustic scenarios,especially when the Signal-to-Noise Ratio(SNR) is relatively low.The maximal improvement of that three objective metrics are about 1.31 dB,4.54 dB and 35%,respectively.In addition,the experiments conducted in real acoustic environment indicate that the proposed method can achieve more noise reduction amount than the conventional method without impacting the speech intelligibility.
-
-