Online generative speech enhancement with diffusion gradient guidance

LI Chenda; QIAN Yanmin

doi:10.12395/0371-0025.2024270

LI Chenda, QIAN Yanmin. Online generative speech enhancement with diffusion gradient guidanceJ. ACTA ACUSTICA, 2025, 50(6): 1644-1651. DOI: 10.12395/0371-0025.2024270

Citation:

LI Chenda, QIAN Yanmin. Online generative speech enhancement with diffusion gradient guidanceJ. ACTA ACUSTICA, 2025, 50(6): 1644-1651. DOI: 10.12395/0371-0025.2024270

Citation:

LI Chenda, QIAN Yanmin. Online generative speech enhancement with diffusion gradient guidanceJ. ACTA ACUSTICA, 2025, 50(6): 1644-1651. DOI: 10.12395/0371-0025.2024270

Online generative speech enhancement with diffusion gradient guidance

Graphical Abstract

Graphical Abstract

Abstract

Abstract

A generative speech enhancement method based on diffusion gradient guidance has been proposed. This method can effectively solve the problem of high computational complexity caused by multi-step inference when diffusion models are used for speech separation tasks. By utilizing a discriminative model, the data gradients needed during the diffusion model inference process can be quickly estimated, thus avoiding complex neural network computations and making it possible to build real-time speech enhancement with diffusion models. Experiments have proven that this approach not only reduces computational overhead, constructing an online generative speech enhancement model with a delay of 50 ms but also improves the quality of the separated speech. In addition, experimental results show that gradient guidance by the speech recognition model can improve the accuracy of recognition of enhanced speech.

FullText(HTML)

References (35)

Cited By

Online generative speech enhancement with diffusion gradient guidance

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content