Abstract:
In response to the issues of low robustness in anonymization and insufficient availability of anonymized speech in downstream tasks, a speaker anonymization method based on adversarial sample generation is proposed. The Adam algorithm is applied to iteratively generate adversarial samples, and speaker features are modified by using these samples to alter the corresponding speaker classification results, thereby achieving speech anonymization. Experimental results demonstrate that, compared with the B1 baseline method of the Voice-Privacy-Challenge-2024, the method’s equal error rate in speaker recognition under semi-informed attacks is improved from 7.64% to 26.30%, greatly enhancing the robustness of the anonymization process. Compared with the B6 baseline method, the method’s word error rate in speech recognition is reduced from 9.39% to 4.25%, and, compared with the SOTA system S1, the accuracy in emotion recognition is improved from 37.84% to 40.18%, effectively protecting the availability of anonymized speech in downstream tasks. In this approach, adversarial perturbations are used to generate adversarial samples, the speaker classification results of the voice are significantly altered, the original speaker’s identity information is concealed, and the robustness of anonymization is enhanced. At the same time, only relatively minor changes are made to the speaker features, thus preserving the data’s availability in downstream tasks.