Abstract:
Current mainstream spoof speech detection systems lack analysis of the flaws introduced by acoustic models of synthetic algorithms. This paper proposes a spoof speech detection method based on glottal flow and speaker features. The proposed method leverages a glottal flow model to model rhythm artifacts, achieving the detection of speech generated by speech synthesis algorithms. In addition, a detection model based on speaker features is introduced to enhance the performance of detecting speech generated by voice conversion algorithms. The proposed method achieves equal error rates of 0.73% and 7.79% in the ASVspoof 2019 LA and ASVspoof 2021 LA evaluation datasets, respectively, exhibiting robustness across various channels and codec scenarios.