Consistency self-supervised learning method for robust automatic speech recognition

GAO Changfeng; CHENG Gaofeng; ZHANG Pengyuan

doi:10.15949/j.cnki.0371-0025.2023.03.008

GAO Changfeng, CHENG Gaofeng, ZHANG Pengyuan. Consistency self-supervised learning method for robust automatic speech recognitionJ. ACTA ACUSTICA, 2023, 48(3): 578-587. DOI: 10.15949/j.cnki.0371-0025.2023.03.008

Citation:

Consistency self-supervised learning method for robust automatic speech recognition

Graphical Abstract

Graphical Abstract

Abstract

Abstract

A robust automatic speech recognition (ASR) method using consistency self-supervised learning (CSSL) is proposed. This method uses speech simulation to generate the speech with different acoustic environments, then uses the self-supervised learning to extract the speech representations and maximize the similarity between the representations of the simulated speech. So invariant speech representations can be extracted in different acoustic environments and the ASR performance can be improved. The proposed method is evaluated on the far-field dataset, CHiME-4, and the meeting dataset, AMI. With the help of the CSSL and appropriate pre-training pipeline, up to 30% relative word error rate can be achieved compared to the wav2vec2.0. This proves the CSSL can extract noise-invariant speech feathers and improve the ASR performance effectively.

FullText(HTML)

References (28)

Cited By

Consistency self-supervised learning method for robust automatic speech recognition

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content