Monaural speech enhancement using U-net fused with multi-head self-attention

FAN Junyi; YANG Jibin; ZHANG Xiongwei; ZHENG Changyan

doi:10.15949/j.cnki.0371-0025.2022.06.007

FAN Junyi, YANG Jibin, ZHANG Xiongwei, ZHENG Changyan. Monaural speech enhancement using U-net fused with multi-head self-attention[J]. ACTA ACUSTICA, 2022, 47(6): 703-716. DOI: 10.15949/j.cnki.0371-0025.2022.06.007

Citation:

Monaural speech enhancement using U-net fused with multi-head self-attention

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Under low Signal-to-Noise Ratio (SNR) and burst background noise conditions, the enhancement effect of existing deep learning-based speech enhancement methods is not satisfactory. In contrast, humans can exploit the long-term correlation of speech to form an integrated perception of different speech signals. Thus, describing the long-term dependencies of speech can help improve the enhancement performance under low SNR and burst background noise. Inspired by this feature, a time domain end-to-end monaural speech enhancement model TU-net that fuses the multi-head self-attention mechanism and U-net deep network is proposed. The TU-net network adopts the codec layer structure of U-net to achieve multi-scale feature fusion, and introduces the dual-path Transformer module using the multi-head self-attention mechanism to calculate the speech mask and better model long-term correlation. TU-net model is trained with a weighted sum loss function in the time domain, time-frequency domain and perceptual domain. Exhaustive experiments are carried out and the results show that TU-net outperforms than other similar monaural enhancement network models in several evaluation metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI) and SNR gain under low SNR and burst background noise conditions, and maintains relatively few network model parameters.

FullText(HTML)

References (49)

Cited By

Monaural speech enhancement using U-net fused with multi-head self-attention

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content