EI / SCOPUS / CSCD 收录

中文核心期刊

FAN Junyi, YANG Jibin, ZHANG Xiongwei, ZHENG Changyan. Monaural speech enhancement using U-net fused with multi-head self-attention[J]. ACTA ACUSTICA, 2022, 47(6): 703-716. DOI: 10.15949/j.cnki.0371-0025.2022.06.007
Citation: FAN Junyi, YANG Jibin, ZHANG Xiongwei, ZHENG Changyan. Monaural speech enhancement using U-net fused with multi-head self-attention[J]. ACTA ACUSTICA, 2022, 47(6): 703-716. DOI: 10.15949/j.cnki.0371-0025.2022.06.007

Monaural speech enhancement using U-net fused with multi-head self-attention

  • Under low Signal-to-Noise Ratio (SNR) and burst background noise conditions, the enhancement effect of existing deep learning-based speech enhancement methods is not satisfactory. In contrast, humans can exploit the long-term correlation of speech to form an integrated perception of different speech signals. Thus, describing the long-term dependencies of speech can help improve the enhancement performance under low SNR and burst background noise. Inspired by this feature, a time domain end-to-end monaural speech enhancement model TU-net that fuses the multi-head self-attention mechanism and U-net deep network is proposed. The TU-net network adopts the codec layer structure of U-net to achieve multi-scale feature fusion, and introduces the dual-path Transformer module using the multi-head self-attention mechanism to calculate the speech mask and better model long-term correlation. TU-net model is trained with a weighted sum loss function in the time domain, time-frequency domain and perceptual domain. Exhaustive experiments are carried out and the results show that TU-net outperforms than other similar monaural enhancement network models in several evaluation metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI) and SNR gain under low SNR and burst background noise conditions, and maintains relatively few network model parameters.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return