MelGAN vocoder based on a window self-attention mechanism

WAN Yi; LI Changtao; LIU Sichen; YANG Feiran; YANG Jun

doi:10.12395/0371-0025.2024357

WAN Yi, LI Changtao, LIU Sichen, YANG Feiran, YANG Jun. MelGAN vocoder based on a window self-attention mechanism[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024357

Citation:

WAN Yi, LI Changtao, LIU Sichen, YANG Feiran, YANG Jun. MelGAN vocoder based on a window self-attention mechanism[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024357

Citation:

WAN Yi, LI Changtao, LIU Sichen, YANG Feiran, YANG Jun. MelGAN vocoder based on a window self-attention mechanism[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024357

MelGAN vocoder based on a window self-attention mechanism

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Vocoders based on generative adversarial networks are characterized by significant advantages in real-time speech generation efficiency. However, improvements in speech quality are often achieved at the cost of increased model size or reduced generalization ability. In this paper, a MelGAN vocoder based on a window self-attention mechanism is proposed. Long-term dependencies of speech are effectively captured by introducing a window self-attention mechanism and a layer-wise shifting strategy. Noise is suppressed during training by incorporating a mel-spectrogram loss, which enhances the quality of synthesized speech. Compared to conventional methods that model long-term dependencies using dilated convolutions, the proposed model efficiently captures speech features while maintaining a lower parameter count. Experimental results demonstrate that the proposed model outperforms the classic MelGAN on both subjective mean opinion scores and the speech quality evaluation model scores in single-speaker scenarios while exhibiting strong generalization capabilities for unseen speakers. Moreover, the proposed model achieves comparable synthesis quality to the high-performance HiFi-GAN vocoder, with significantly fewer parameters and faster inference speed.

FullText(HTML)

References (36)

Cited By

MelGAN vocoder based on a window self-attention mechanism

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content