EI / SCOPUS / CSCD 收录

中文核心期刊

采用对抗样本生成的语音匿名化

Speaker anonymization using adversarial sample generation

  • 摘要: 针对当前语音匿名化方法中存在的匿名化鲁棒性较差以及匿名化语音在下游任务可用性不足等问题, 提出了一种采用对抗样本生成的语音匿名化方法。使用Adam循坏迭代生成对抗样本, 并以此修改说话人特征, 改变其对应的说话人分类结果, 实现语音的匿名化。实验结果表明, 与2024语音隐私挑战赛的B1基线方法相比, 该方法在半知情攻击场景下的说话人识别的等错误率从7.64%提高到26.30%, 大大提高了匿名化过程的鲁棒性; 与B6基线方法相比, 该方法在语音识别中的词错误率从9.39%降低到4.25%, 并与SOTA系统S1相比, 情感识别的准确率从37.84%提高到40.18%, 很好地保护了匿名化语音在下游任务中的可用性。该方法利用对抗性扰动生成对抗样本, 显著改变了语音的说话人分类结果, 实现原说话人身份信息的隐藏, 提升匿名化鲁棒性; 同时, 匿名化过程仅对说话人特征进行了较少的改变, 从而保护了数据在下游任务中的可用性。

     

    Abstract: In response to the issues of low robustness in anonymization and insufficient availability of anonymized speech in downstream tasks, a speaker anonymization method based on adversarial sample generation is proposed. The Adam algorithm is applied to iteratively generate adversarial samples, and speaker features are modified by using these samples to alter the corresponding speaker classification results, thereby achieving speech anonymization. Experimental results demonstrate that, compared with the B1 baseline method of the Voice-Privacy-Challenge-2024, the method’s equal error rate in speaker recognition under semi-informed attacks is improved from 7.64% to 26.30%, greatly enhancing the robustness of the anonymization process. Compared with the B6 baseline method, the method’s word error rate in speech recognition is reduced from 9.39% to 4.25%, and, compared with the SOTA system S1, the accuracy in emotion recognition is improved from 37.84% to 40.18%, effectively protecting the availability of anonymized speech in downstream tasks. In this approach, adversarial perturbations are used to generate adversarial samples, the speaker classification results of the voice are significantly altered, the original speaker’s identity information is concealed, and the robustness of anonymization is enhanced. At the same time, only relatively minor changes are made to the speaker features, thus preserving the data’s availability in downstream tasks.

     

/

返回文章
返回