EI / SCOPUS / CSCD 收录

中文核心期刊

柳宗铭, 王丽, 李军锋, 张鹏远. 声学发音模型辅助建模的发音错误检测与诊断[J]. 声学学报, 2023, 48(1): 264-273. DOI: 10.15949/j.cnki.0371-0025.2023.01.020
引用本文: 柳宗铭, 王丽, 李军锋, 张鹏远. 声学发音模型辅助建模的发音错误检测与诊断[J]. 声学学报, 2023, 48(1): 264-273. DOI: 10.15949/j.cnki.0371-0025.2023.01.020
LIU Zongming, WANG Li, LI Junfeng, ZHANG Pengyuan. Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling[J]. ACTA ACUSTICA, 2023, 48(1): 264-273. DOI: 10.15949/j.cnki.0371-0025.2023.01.020
Citation: LIU Zongming, WANG Li, LI Junfeng, ZHANG Pengyuan. Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling[J]. ACTA ACUSTICA, 2023, 48(1): 264-273. DOI: 10.15949/j.cnki.0371-0025.2023.01.020

声学发音模型辅助建模的发音错误检测与诊断

Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling

  • 摘要: 发音错误检测与诊断(MDD)任务的专家标注数据稀缺。从添加发音模型更高效地利用有限数据建模发音规律,辅助基于音素识别的MDD的思路出发,提出一种同时融合声学和文本信息,在理论上更完备地建模发音错误产生过程的声学发音模型。基于发音错误产生过程不同部分的声学关联性,该模型通过与音素识别模型共享声学编码器网络参数,以多任务学习方式联合优化,实现辅助建模。并且,提出声学置信度掩蔽--预测训练方式进一步强化两个任务的联系,提高辅助建模效率。实验表明,声学发音模型能够有效建模发音错误规律;利用其辅助音素识别模型建模后,MDD系统在发音错误检测、诊断和音素识别上分别有4.9%,9.5%和14.0%的提升;声学置信度掩蔽--预测训练方法提高了辅助建模效率,掩蔽参数或联合优化参数选择也会影响辅助建模效果。

     

    Abstract: For Mispronunciation Detection and Diagnosis (MDD) tasks, expert-annotated data are scarce. To efficiently model pronunciation regularities on limited data and then aid MDD systems, an acoustic pronunciation model that integrates both acoustic and textual information is proposed. It models the mispronunciation generation process in a more theoretically complete way. Based on the acoustic correlation of different parts of this process, the model achieves aided modeling by sharing the acoustic encoder network parameters with the phoneme recognition model and optimizing it jointly in a multi-task learning manner. Moreover, the acoustic confidence masking-prediction training approach is proposed to further strengthen the correlation between the two tasks and improve the efficiency of aided modeling. Experiments show that the acoustic pronunciation model can effectively model mispronunciation regularities. With its aid in phoneme recognition modeling, the MDD system showed 4.9%, 9.5%, and 14.0% improvement in mispronunciation detection, diagnosis, and phoneme recognition, respectively. The acoustic confidence masking-prediction training method improves the efficiency of aided modeling, and both the masking parameters and the multi-task learning parameters can affect the effectiveness of aided modeling.

     

/

返回文章
返回