Online generative speech enhancement with diffusion gradient guidance
-
Graphical Abstract
-
Abstract
A generative speech enhancement method based on diffusion gradient guidance has been proposed. This method can effectively solve the problem of high computational complexity caused by multi-step inference when diffusion models are used for speech separation tasks. By utilizing a discriminative model, the data gradients needed during the diffusion model inference process can be quickly estimated, thus avoiding complex neural network computations and making it possible to build real-time speech enhancement with diffusion models. Experiments have proven that this approach not only reduces computational overhead, constructing an online generative speech enhancement model with a delay of 50 ms but also improves the quality of the separated speech. In addition, experimental results show that gradient guidance by the speech recognition model can improve the accuracy of recognition of enhanced speech.
-
-