EI / SCOPUS / CSCD 收录

中文核心期刊

WAN Yi, LI Changtao, LIU Sichen, YANG Feiran, YANG Jun. MelGAN vocoder based on a window self-attention mechanism[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024357
Citation: WAN Yi, LI Changtao, LIU Sichen, YANG Feiran, YANG Jun. MelGAN vocoder based on a window self-attention mechanism[J]. ACTA ACUSTICA. DOI: 10.12395/0371-0025.2024357

MelGAN vocoder based on a window self-attention mechanism

  • Vocoders based on generative adversarial networks are characterized by significant advantages in real-time speech generation efficiency. However, improvements in speech quality are often achieved at the cost of increased model size or reduced generalization ability. In this paper, a MelGAN vocoder based on a window self-attention mechanism is proposed. Long-term dependencies of speech are effectively captured by introducing a window self-attention mechanism and a layer-wise shifting strategy. Noise is suppressed during training by incorporating a mel-spectrogram loss, which enhances the quality of synthesized speech. Compared to conventional methods that model long-term dependencies using dilated convolutions, the proposed model efficiently captures speech features while maintaining a lower parameter count. Experimental results demonstrate that the proposed model outperforms the classic MelGAN on both subjective mean opinion scores and the speech quality evaluation model scores in single-speaker scenarios while exhibiting strong generalization capabilities for unseen speakers. Moreover, the proposed model achieves comparable synthesis quality to the high-performance HiFi-GAN vocoder, with significantly fewer parameters and faster inference speed.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return