EI / SCOPUS / CSCD 收录

中文核心期刊

GAO Changfeng, CHENG Gaofeng, ZHANG Pengyuan. Consistency self-supervised learning method for robust automatic speech recognition[J]. ACTA ACUSTICA, 2023, 48(3): 578-587. DOI: 10.15949/j.cnki.0371-0025.2023.03.008
Citation: GAO Changfeng, CHENG Gaofeng, ZHANG Pengyuan. Consistency self-supervised learning method for robust automatic speech recognition[J]. ACTA ACUSTICA, 2023, 48(3): 578-587. DOI: 10.15949/j.cnki.0371-0025.2023.03.008

Consistency self-supervised learning method for robust automatic speech recognition

  • A robust automatic speech recognition (ASR) method using consistency self-supervised learning (CSSL) is proposed. This method uses speech simulation to generate the speech with different acoustic environments, then uses the self-supervised learning to extract the speech representations and maximize the similarity between the representations of the simulated speech. So invariant speech representations can be extracted in different acoustic environments and the ASR performance can be improved. The proposed method is evaluated on the far-field dataset, CHiME-4, and the meeting dataset, AMI. With the help of the CSSL and appropriate pre-training pipeline, up to 30% relative word error rate can be achieved compared to the wav2vec2.0. This proves the CSSL can extract noise-invariant speech feathers and improve the ASR performance effectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return