Single-channel deep time-domain speech enhancement networks for cabin environments

ZHANG Lin; WANG Haitao; YANG Shuang; ZENG Xiangyang; CHEN Ke’an

doi:10.15949/j.cnki.0371-0025.2023.04.012

Volume 48 Issue 4

Jul. 2023

Turn off MathJax

Article Contents

Abstract

References

ACTA ACUSTICA > 2023 > 48(4): 890-900. > DOI: 10.15949/j.cnki.0371-0025.2023.04.012 CSTR: 32049.14.11-2065.2023.04.012

ZHANG Lin, WANG Haitao, YANG Shuang, ZENG Xiangyang, CHEN Ke’an. Single-channel deep time-domain speech enhancement networks for cabin environments[J]. ACTA ACUSTICA, 2023, 48(4): 890-900. DOI: 10.15949/j.cnki.0371-0025.2023.04.012

Citation:

PDF (1353 KB)

Single-channel deep time-domain speech enhancement networks for cabin environments

School of Marine Science and Technology, Northwestern Polytechnical University　Xi’an　710072

More Information

PACS:
Received Date: December 15, 2022
Revised Date: March 16, 2023
Available Online: July 12, 2023

Graphical Abstract

Abstract

Abstract

A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.
- Cabin environment,
- Single channel speech enhancement,
- Deep network,
- Parallel dilated convolution,
- Group convolution

FullText(HTML)

References (28)

References

[1]	Chen Z, Wang R, Yin F, et al. Speech dereverberation method based on spectral subtraction and spectral line enhancement. Appl. Acoust., 2016; 112: 201—210 DOI: 10.1016/j.apacoust.2016.05.017
[2]	Xiao K, Wang S, Wan M, et al. Radiated noise suppression for electrolarynx speech based on multiband time-domain amplitude modulation. IEEE/ACM Trans. Audio Speech Lang. Process., 2018; 26(9): 1585—1593 DOI: 10.1109/TASLP.2018.2834729
[3]	Chen J, Benesty J, Huang Y, et al. New insights into the noise reduction Wiener filter. IEEE Trans. Audio Speech Lang. Process., 2006; 14(4): 1218—1234 DOI: 10.1109/TSA.2005.860851
[4]	Ercelebi E. Speech enhancement based on the discrete Gabor transform and multi-notch adaptive digital filters. Appl. Acoust., 2004; 65(8): 739—762 DOI: 10.1016/j.apacoust.2004.02.004
[5]	Sayoud A, Djendi M, Medahi S, et al. A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement. Appl. Acoust., 2018; 135: 101—110 DOI: 10.1016/j.apacoust.2018.02.002
[6]	Surendran S, Kumar T K. Oblique projection and cepstral subtraction in signal subspace speech enhancement for colored noise reduction. IEEE/ACM Trans. Audio, Speech, Lang. Process., 2018; 26(12): 2328—2340 DOI: 10.1109/TASLP.2018.2864535
[7]	Jin W, Liu X, Scordilis M S, et al. Speech enhancement using harmonic emphasis and adaptive comb filtering. IEEE Trans. Audio Speech Lang. , Process., 2010; 18(2): 356—368 DOI: 10.1109/TASL.2009.2028916
[8]	Dendrinos M, Bakamidis S, Carayannis G. Speech enhancement from noise: A regenerative approach. Speech. Commun., 1991; 10: 45—57 DOI: 10.1016/0167-6393(91)90027-Q
[9]	曹斌芳, 李建奇, 李婷. 强噪声环境下语音增强算法的比较研究. 噪声与振动控制, 2010; 30: 55—58 DOI: 10.3969/j.issn.1006-1355.2010.03.015
[10]	Park S R, Lee J. A fully convolutional neural network for speech enhancement. 18th Annual Conference of the International Speech Communication Association, ISCA, Stockholm, Sweden, 2017: 1465-1468
[11]	Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep U-net convolutional networks. 18th International Society for Music Information Retrieval Conference, ISMIR, Suzhou, China, 2017: 745-751
[12]	Choi H S, Kim J H, Huh J, et al. Phase-aware speech enhancement with deep complex U-net. 7th International Conference on Learning Representations, ICLR, New Orleans, USA, 2019
[13]	Kong Q, Cao Y, Liu H, et al. Decoupling magnitude and phase estimation with deep ResUNet for music source separation. 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, 2021: 342-349
[14]	聂玲子, 陈雪勤, 赵鹤鸣. 结合幅度谱和功率谱字典的语音増强方法. 声学学报, 2021; 46(1): 81—91 DOI: 10.15949/j.cnki.0371-0025.2021.01.008
[15]	柏浩钧, 张天骐, 刘鉴兴, 等. 联合精确比值掩蔽与深度神经网络的单通道语音增强方法. 声学学报, 2022; 47(3): 394—404 DOI: 10.15949/j.cnki.0371-0025.2022.03.009
[16]	武瑞沁, 陈雪勤, 俞杰, 等. 结合注意力机制的改进U-Net网络在端到端语音增强中的应用. 声学学报, 2022; 47(2): 266—275 DOI: 10.15949/j.cnki.0371-0025.2022.02.011
[17]	Yin D C, Luo C, Xiong Z W, et al. PHASEN: A phase-and-harmonics-aware speech enhancement network. 34th AAAI Conference on Artificial Intelligence, AAAI , New York, USA, 2020
[18]	Luo Y, Mesgarani N, TaSNet: time-domain audio separation network for real-time, single-channel speech separation. 43rd IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Calgary, Canada, 2018: 696-700
[19]	Luo Y, Mesgarani N. Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process., 2019; 27(8): 1256—1266 DOI: 10.1109/TASLP.2019.2915167
[20]	左孔成, 陈鹏, 王政, 等. 飞机舱内噪声的研究现状. 航空学报, 2016; 37(8): 2370—2384 DOI: 10.7527/S1000-6893.2016.0073
[21]	Lu Y, Wang F, Ma X. Helicopter interior noise reduction using compounded periodic struts. J. Sound Vib., 2018; 435: 264—280 DOI: 10.1016/j.jsv.2018.07.024
[22]	马大献, 李沛滋, 戴根华, 等. 阻塞喷注的冲击噪声. 声学学报, 1980; 5(3): 172—182 DOI: 10.15949/j.cnki.0371-0025.1980.03.002
[23]	崔岸婧, 李道京, 周凯, 等. 阵列结构下的低频信号合成方法研究. 物理学报, 2020; 69(19): 194101 DOI: 10.7498/aps.69.20200501
[24]	Ren Z, Kong Q, Han J, et al. Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes. 44th IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Brighton, UK, 2019: 56—60
[25]	Xian Y, Sun Y, Wang W W, et al. Convolutional fusion network for monaural speech enhancement. Neural Netw., 2021; 143: 97—107 DOI: 10.1016/j.neunet.2021.05.017
[26]	Hu Y, Loizou P C. Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process., 2008; 16(1): 229—238 DOI: 10.1109/TASL.2007.911054
[27]	Taala C H, Hendriks R C, Heusdens R. An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. J. Acoust. Soc. Am., 2011; 130(5): 3013—3027 DOI: 10.1121/1.3641373
[28]	Hu Y, Loizou P C. Subjective comparison and evaluation of speech enhancement algorithms. Speech. Commun., 2007; 49: 588—601 DOI: 10.1016/j.specom.2006.12.006

[1]	TAN Xiaofeng, LI Xihai, NIU Chao, ZENG Xiaoniu, LI Hongru, LIU Tianyou. Infrasound event classification with multi-channel multi-scale convolutional attention network[J]. ACTA ACUSTICA, 2025, 50(4): 892-898. DOI: 10.12395/0371-0025.2023286
[2]	LIANG Yinian, LI Jie, LONG Lirong, CHEN Fangjiong. Incoherently distributed sources localization using convolutional neural network[J]. ACTA ACUSTICA, 2024, 49(1): 38-48. DOI: 10.12395/0371-0025.2022138
[3]	DU Shuanping, CHEN Yuechao, LUO Zhaorui. Line spectrum extraction of underwater acoustic target using deep convolution network and adaptive enhancement learning[J]. ACTA ACUSTICA, 2023, 48(4): 699-714. DOI: 10.15949/j.cnki.0371-0025.2023.04.008
[4]	TANG Guichen, LIANG Ruiyu, KONG Fanliu, XIE Yue, JU Mengjie. A non-invasive speech quality evaluation algorithm combining auxiliary target learning and convolutional recurrent network[J]. ACTA ACUSTICA, 2022, 47(5): 692-702. DOI: 10.15949/j.cnki.0371-0025.2022.05.003
[5]	SUN Xingwei, LI Junfeng, YAN Yonghong. Speech dereverberation method with convolutional neural network and reverberation time attention[J]. ACTA ACUSTICA, 2021, 46(6): 1234-1241. DOI: 10.15949/j.cnki.0371-0025.2021.06.043
[6]	WANG Wenbo, SU Lin, JIA Yuqing, REN Qunyan, MA Li. Convolution neural network ranging method in the deep-sea direct-arrival zone[J]. ACTA ACUSTICA, 2021, 46(6): 1081-1092. DOI: 10.15949/j.cnki.0371-0025.2021.06.027
[7]	XUE Cheng, GONG Zaixiao, GU Yiming, WANG Yu, LIN Peng, LI Zhenglin. Channel matching of shallow water active detection combined with convolutional neural network[J]. ACTA ACUSTICA, 2021, 46(6): 800-812. DOI: 10.15949/j.cnki.0371-0025.2021.06.003
[8]	LIAN Hailun, ZHOU Jian, HU Yuting, ZHENG Wenming. Whisper to normal speech conversion using deep convolutional neural networks[J]. ACTA ACUSTICA, 2020, 45(1): 137-144. DOI: 10.15949/j.cnki.0371-0025.2020.01.017
[9]	LU Cheng, TIAN Meng, ZHOU Jian, WANG Huabin, TAO Liang. A single-channel speech enhancement approach using convolutive non-negative matrix factorization with L_1/2 sparse constraint[J]. ACTA ACUSTICA, 2017, 42(3): 377-384. DOI: 10.15949/j.cnki.0371-0025.2017.03.016
[10]	LÜ Zhao, WU Xiaopei, ZHANG Chao, LI Mi. Robust speech features extraction in convolutional noise environment[J]. ACTA ACUSTICA, 2010, 35(4): 465-470. DOI: 10.15949/j.cnki.0371-0025.2010.04.013

Cited By

Cited by

Periodical cited type(3)

1.	黄辉波，邵玉斌，龙华，杜庆治. 低信噪比下基于B-Wave-U-Net特征增强的音素识别. 北京邮电大学学报. 2025(01): 100-106 .
2.	丁惜瀛，付直刚，马少华. 基于声纹识别的永磁同步电机运行状态监测. 沈阳工业大学学报. 2025(02): 145-151 .
3.	解元，邹涛，孙为军，谢胜利. 基于混合混响模型的多通道语音增强算法. 通信学报. 2024(11): 15-26 .

Other cited types(1)

Get Citation

PDF

XML

Article Metrics

Article views (208) PDF downloads (30) Cited by(4)

Single-channel deep time-domain speech enhancement networks for cabin environments

Abstract

References

Related Articles

Cited by

Periodical cited type(3)

Other cited types(1)

Catalog

Article Metrics

Related

Single-channel deep time-domain speech enhancement networks for cabin environments

Abstract

References

Related Articles

Cited by

Periodical cited type(3)

Other cited types(1)

Catalog

Article Metrics

Related

Export File

Citation

Format

Content