Citation: | ZHANG Lin, WANG Haitao, YANG Shuang, ZENG Xiangyang, CHEN Ke’an. Single-channel deep time-domain speech enhancement networks for cabin environments[J]. ACTA ACUSTICA, 2023, 48(4): 890-900. DOI: 10.15949/j.cnki.0371-0025.2023.04.012 |
A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.
[1] |
Chen Z, Wang R, Yin F, et al. Speech dereverberation method based on spectral subtraction and spectral line enhancement. Appl. Acoust., 2016; 112: 201—210 DOI: 10.1016/j.apacoust.2016.05.017
|
[2] |
Xiao K, Wang S, Wan M, et al. Radiated noise suppression for electrolarynx speech based on multiband time-domain amplitude modulation. IEEE/ACM Trans. Audio Speech Lang. Process., 2018; 26(9): 1585—1593 DOI: 10.1109/TASLP.2018.2834729
|
[3] |
Chen J, Benesty J, Huang Y, et al. New insights into the noise reduction Wiener filter. IEEE Trans. Audio Speech Lang. Process., 2006; 14(4): 1218—1234 DOI: 10.1109/TSA.2005.860851
|
[4] |
Ercelebi E. Speech enhancement based on the discrete Gabor transform and multi-notch adaptive digital filters. Appl. Acoust., 2004; 65(8): 739—762 DOI: 10.1016/j.apacoust.2004.02.004
|
[5] |
Sayoud A, Djendi M, Medahi S, et al. A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement. Appl. Acoust., 2018; 135: 101—110 DOI: 10.1016/j.apacoust.2018.02.002
|
[6] |
Surendran S, Kumar T K. Oblique projection and cepstral subtraction in signal subspace speech enhancement for colored noise reduction. IEEE/ACM Trans. Audio, Speech, Lang. Process., 2018; 26(12): 2328—2340 DOI: 10.1109/TASLP.2018.2864535
|
[7] |
Jin W, Liu X, Scordilis M S, et al. Speech enhancement using harmonic emphasis and adaptive comb filtering. IEEE Trans. Audio Speech Lang. , Process., 2010; 18(2): 356—368 DOI: 10.1109/TASL.2009.2028916
|
[8] |
Dendrinos M, Bakamidis S, Carayannis G. Speech enhancement from noise: A regenerative approach. Speech. Commun., 1991; 10: 45—57 DOI: 10.1016/0167-6393(91)90027-Q
|
[9] |
曹斌芳, 李建奇, 李婷. 强噪声环境下语音增强算法的比较研究. 噪声与振动控制, 2010; 30: 55—58 DOI: 10.3969/j.issn.1006-1355.2010.03.015
|
[10] |
Park S R, Lee J. A fully convolutional neural network for speech enhancement. 18th Annual Conference of the International Speech Communication Association, ISCA, Stockholm, Sweden, 2017: 1465-1468
|
[11] |
Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep U-net convolutional networks. 18th International Society for Music Information Retrieval Conference, ISMIR, Suzhou, China, 2017: 745-751
|
[12] |
Choi H S, Kim J H, Huh J, et al. Phase-aware speech enhancement with deep complex U-net. 7th International Conference on Learning Representations, ICLR, New Orleans, USA, 2019
|
[13] |
Kong Q, Cao Y, Liu H, et al. Decoupling magnitude and phase estimation with deep ResUNet for music source separation. 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, 2021: 342-349
|
[14] |
聂玲子, 陈雪勤, 赵鹤鸣. 结合幅度谱和功率谱字典的语音増强方法. 声学学报, 2021; 46(1): 81—91 DOI: 10.15949/j.cnki.0371-0025.2021.01.008
|
[15] |
柏浩钧, 张天骐, 刘鉴兴, 等. 联合精确比值掩蔽与深度神经网络的单通道语音增强方法. 声学学报, 2022; 47(3): 394—404 DOI: 10.15949/j.cnki.0371-0025.2022.03.009
|
[16] |
武瑞沁, 陈雪勤, 俞杰, 等. 结合注意力机制的改进U-Net网络在端到端语音增强中的应用. 声学学报, 2022; 47(2): 266—275 DOI: 10.15949/j.cnki.0371-0025.2022.02.011
|
[17] |
Yin D C, Luo C, Xiong Z W, et al. PHASEN: A phase-and-harmonics-aware speech enhancement network. 34th AAAI Conference on Artificial Intelligence, AAAI , New York, USA, 2020
|
[18] |
Luo Y, Mesgarani N, TaSNet: time-domain audio separation network for real-time, single-channel speech separation. 43rd IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Calgary, Canada, 2018: 696-700
|
[19] |
Luo Y, Mesgarani N. Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process., 2019; 27(8): 1256—1266 DOI: 10.1109/TASLP.2019.2915167
|
[20] |
左孔成, 陈鹏, 王政, 等. 飞机舱内噪声的研究现状. 航空学报, 2016; 37(8): 2370—2384 DOI: 10.7527/S1000-6893.2016.0073
|
[21] |
Lu Y, Wang F, Ma X. Helicopter interior noise reduction using compounded periodic struts. J. Sound Vib., 2018; 435: 264—280 DOI: 10.1016/j.jsv.2018.07.024
|
[22] |
马大献, 李沛滋, 戴根华, 等. 阻塞喷注的冲击噪声. 声学学报, 1980; 5(3): 172—182 DOI: 10.15949/j.cnki.0371-0025.1980.03.002
|
[23] |
崔岸婧, 李道京, 周凯, 等. 阵列结构下的低频信号合成方法研究. 物理学报, 2020; 69(19): 194101 DOI: 10.7498/aps.69.20200501
|
[24] |
Ren Z, Kong Q, Han J, et al. Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes. 44th IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Brighton, UK, 2019: 56—60
|
[25] |
Xian Y, Sun Y, Wang W W, et al. Convolutional fusion network for monaural speech enhancement. Neural Netw., 2021; 143: 97—107 DOI: 10.1016/j.neunet.2021.05.017
|
[26] |
Hu Y, Loizou P C. Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process., 2008; 16(1): 229—238 DOI: 10.1109/TASL.2007.911054
|
[27] |
Taala C H, Hendriks R C, Heusdens R. An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. J. Acoust. Soc. Am., 2011; 130(5): 3013—3027 DOI: 10.1121/1.3641373
|
[28] |
Hu Y, Loizou P C. Subjective comparison and evaluation of speech enhancement algorithms. Speech. Commun., 2007; 49: 588—601 DOI: 10.1016/j.specom.2006.12.006
|