EI / SCOPUS / CSCD 收录

中文核心期刊

柯雨璇, 厉剑, 彭任华, 郑成诗, 李晓东. 用于自适应波束形成语音增强的球谐域掩蔽函数估计方法[J]. 声学学报, 2021, 46(1): 67-80. DOI: 10.15949/j.cnki.0371-0025.2021.01.007
引用本文: 柯雨璇, 厉剑, 彭任华, 郑成诗, 李晓东. 用于自适应波束形成语音增强的球谐域掩蔽函数估计方法[J]. 声学学报, 2021, 46(1): 67-80. DOI: 10.15949/j.cnki.0371-0025.2021.01.007
KE Yuxuan, LI Jian, PENG Renhua, ZHENG Chengshi, LI Xiaodong. Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement[J]. ACTA ACUSTICA, 2021, 46(1): 67-80. DOI: 10.15949/j.cnki.0371-0025.2021.01.007
Citation: KE Yuxuan, LI Jian, PENG Renhua, ZHENG Chengshi, LI Xiaodong. Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement[J]. ACTA ACUSTICA, 2021, 46(1): 67-80. DOI: 10.15949/j.cnki.0371-0025.2021.01.007

用于自适应波束形成语音增强的球谐域掩蔽函数估计方法

Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement

  • 摘要: 提出一种用于球形阵列自适应波束形成的掩蔽函数估计方法。该方法利用包含空间信息的球谐系数提取低维空间向量,并采用复高斯混合模型和深度学习两种方案来估计掩蔽函数,最终利用估计的掩蔽函数设计最小方差无失真响应波束形成器,以达到空域滤波的效果。理论分析和仿真实验证明,对于相同时长的声信号,球谐域掩蔽函数估计方法的计算复杂度比传统阵元域估计方法低了一个数量级。并且在大部分声场环境中,尤其在低信噪比情况下,所提方法的语音质量感知评估测度得分、分段信噪比和短时客观可懂度明显高于阵元域方法,三者最高分别可提升1.31 dB,4.54 dB和35%。另外,实际声学环境的测量实验也验证了所提方法在不影响可懂度的条件下比传统阵元域方法具备更高的降噪量。

     

    Abstract: A mask estimation method for adaptive beamforming for spherical microphone arrays is proposed which at first extracts the low-dimensional spatial vector containing spatial information from the spherical harmonic coefficients of the received signals,and then employs a Complex Gaussian Mixture Model(CGMM) or a deep learning network to estimate the mask.Finally,the estimate mask is used to design the Minimum Variance Distortionless Response(MVDR) beamformer,so that the directional interferences can be suppressed.The simulation results show that the computational complexity of the proposed method is one-level magnitude lower than the conventional method processing in microphone domain,and the corresponding MVDR beamformer can achieve much better performance in terms of Perceptual Evaluation of Speech Quality(PESQ),segmental Signal-to-Noise Ratio(segSNR),and Short-Time Objective Intelligibility(STOI) in most acoustic scenarios,especially when the Signal-to-Noise Ratio(SNR) is relatively low.The maximal improvement of that three objective metrics are about 1.31 dB,4.54 dB and 35%,respectively.In addition,the experiments conducted in real acoustic environment indicate that the proposed method can achieve more noise reduction amount than the conventional method without impacting the speech intelligibility.

     

/

返回文章
返回