基于扩散梯度指导的在线生成式语音增强

李晨达; 钱彦旻

doi:10.12395/0371-0025.2024270

基于扩散梯度指导的在线生成式语音增强

Online generative speech enhancement with diffusion gradient guidance

摘要

摘要: 提出了一种基于扩散梯度指导的生成式语音增强方法, 该方法可有效解决扩散模型用于语音增强任务时多步推理导致的计算复杂度过高的问题。利用判别式模型快速估计扩散模型推理过程中需要的数据梯度, 避免神经网络的复杂运算, 使扩散模型可以用于实时语音增强。实验结果表明, 所提方法不但可以降低计算开销, 构建延时50 ms的在线生成式语音增强模型, 同时可以改善增强语音的质量。此外, 通过语音识别模型提供的数据梯度指导可以改善增强后语音的识别效果。

Abstract: A generative speech enhancement method based on diffusion gradient guidance has been proposed. This method can effectively solve the problem of high computational complexity caused by multi-step inference when diffusion models are used for speech separation tasks. By utilizing a discriminative model, the data gradients needed during the diffusion model inference process can be quickly estimated, thus avoiding complex neural network computations and making it possible to build real-time speech enhancement with diffusion models. Experiments have proven that this approach not only reduces computational overhead, constructing an online generative speech enhancement model with a delay of 50 ms but also improves the quality of the separated speech. In addition, experimental results show that gradient guidance by the speech recognition model can improve the accuracy of recognition of enhanced speech.

HTML全文

参考文献(35)

施引文献

资源附件(0)