结合声门流和声纹特征的伪造语音检测

Spoof speech detection integrating glottal flow and speaker features

摘要: 目前的主流伪造语音检测系统缺乏对于由伪造算法声学模型所引入缺陷的分析。基于此, 提出了一种基于声门流和声纹特征的伪造语音检测方法。该方法通过声门流模型建模节奏伪影, 实现了对于语音合成算法所生成语音的检测。并通过基于声纹特征的模型改善对于语音转换算法所生成语音的检测能力。所提方法在ASVspoof 2019 LA测试集和ASVspoof 2021 LA测试集中分别取得了0.73%和7.79%的等错误率, 且对于多种信道和编解码场景具有鲁棒性。

Abstract: Current mainstream spoof speech detection systems lack analysis of the flaws introduced by acoustic models of synthetic algorithms. This paper proposes a spoof speech detection method based on glottal flow and speaker features. The proposed method leverages a glottal flow model to model rhythm artifacts, achieving the detection of speech generated by speech synthesis algorithms. In addition, a detection model based on speaker features is introduced to enhance the performance of detecting speech generated by voice conversion algorithms. The proposed method achieves equal error rates of 0.73% and 7.79% in the ASVspoof 2019 LA and ASVspoof 2021 LA evaluation datasets, respectively, exhibiting robustness across various channels and codec scenarios.