Citation: | LIU Zongming, WANG Li, LI Junfeng, ZHANG Pengyuan. Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling[J]. ACTA ACUSTICA, 2023, 48(1): 264-273. DOI: 10.15949/j.cnki.0371-0025.2023.01.020 |
[1] |
Li K, Qian X, Meng H. Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE ACM Trans. Audio Speech Lang. Process., 2016; 25(1):193-207
|
[2] |
袁桦, 史永哲, 赵军红等. 基于JSM和MLP改进发音错误检测的方法. 自动化学报, 2014; 40(12):2815-2823
|
[3] |
Leung W K, Liu X, Meng H. CNN-RNN-CTC based end-to-end mispronunciation detection and diagnosis. IEEE International Conference on Acoustics, Speech and Signal Processing, 2019:8132-8136
|
[4] |
Feng Y, Fu G, Chen Q et al. SED-MDD:Towards sentence dependent end-to-end mispronunciation detection and diagnosis. IEEE International Conference on Acoustics, Speech and Signal Processing, 2020:3492-3496
|
[5] |
Wu M, Li K, Leung W K et al. Transformer based end-to-end mispronunciation detection and diagnosis. Interspeech, ISCA, 2021:3954-3958
|
[6] |
黄浩, 王建明, 哈力旦·阿布都热依木, 吾守尔·斯拉木. 自动发音错误检测中基于F1值最大化的声学模型训练方法. 声学学报, 2013; 38(6):751-758
|
[7] |
Kawai G, Hirose K. A method for measuring the intelligibility and nonnativeness of phone quality in foreign language
|
[8] |
Harrison A M, Lau W Y, Meng H M et al. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. Ninth Annual Conference of the International Speech Communication Association, 2008:2787-2790
|
[9] |
Harrison A M, Lo W K, Qian X et al. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. International Workshop on Speech and Language Technology in Education, 2009:2787-2790
|
[10] |
Wang Y B, Lee L S. Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training. IEEE International Conference on Acoustics, Speech and Signal Processing, 2012:5049-5052
|
[11] |
葛凤培, 潘复平, 董滨, 颜永红. 汉语发音质量评估的实验研究. 声学学报, 2010; 35(2):261-266
|
[12] |
张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测. 清华大学学报(自然科学版), 2016; 56(11):1220-1225
|
[13] |
安丽丽, 吴延年, 刘志等. 一种基于检错音网络的发音错误检测新算法. 电子与信息学报, 2012; 34(9):2085-2090
|
[14] |
袁桦, 钱彦旻, 赵军红等. 基于优化检测网络和MLP特征改进发音错误检测的方法. 清华大学学报(自然科学版), 2012; 52(4):557-560
|
[15] |
张茹, 韩纪庆. 一种基于音素模型感知度的发音质量评价方法. 声学学报, 2013; 38(2):201-207
|
[16] |
Qian X, Soong F K, Meng H. Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). Eleventh Annual Conference of the International Speech Communication Association, 2010:757-760
|
[17] |
Luo D, Yang X, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus. Twelfth Annual Conference of the International Speech Communication Association, 2011:1593-1596
|
[18] |
Lo W K, Zhang S, Meng H. Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. Eleventh annual Conference of the International Speech Communication Association, 2010:765-768
|
[19] |
Qian X, Meng H, Soong F. Capturing L2 segmental mispronunciations with joint-sequence models in computer-aided pronunciation training (CAPT). 7th International Symposium on Chinese Spoken Language Processing, IEEE, 201084-88
|
[20] |
Duan R, Kawahara T, Dantsuji M et al. Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data. IEEE International Conference on Acoustics, Speech and Signal Processing, 2017:5815-5819
|
[21] |
Korzekwa D, Lorenzo-Trueba J, Zaporowski S et al. Mispronunciation detection in non-native (L2) English with uncertainty modeling. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021:7738-7742
|
[22] |
Higuchi Y, Watanabe S, Chen N et al. Mask CTC:Non-autoregressive end-to-end ASR with CTC and mask predict. Interspeech, ISCA, 2020:3655-3659
|
[23] |
Zhao G, Sonsaat S, Silpachai A et al. L2-ARCTIC:A non-native English speech corpus. Interspeech, ISCA, 2018:2783-2787
|
[24] |
Garofolo J S, Lamel L F, Fisher W M et al. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NIST Interagency/Internal Report (NISTIR)-4930, National Institute of Standards and Technology, 1993
|
[25] |
Zhang Z, Wang Y, Yang J. Text-conditioned transformer for automatic pronunciation error detection. Speech Commun., 2021; 130:55-63
|
[26] |
Peng L, Fu K, Lin B et al. A Study on fine-tuning wav2vec2.0 model for the task of mispronunciation detection and diagnosis. Interspeech, ISCA, 2021:4448-4452
|
[27] |
Yan B C, Wu M C, Hung H T et al. An end-to-end mispronunciation detection system for L2 English speech leveraging novel anti-phone modeling. Interspeech, ISCA, 2020:3032-3036
|
[28] |
Watanabe S, Hori T, Kim S et al. Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE J. Sel. Top. Signal Process., 2017; 11(8):1240-1253
|