Whisper to normal speech conversion using deep convolutional neural networks

LIAN Hailun; ZHOU Jian; HU Yuting; ZHENG Wenming

doi:10.15949/j.cnki.0371-0025.2020.01.017

LIAN Hailun, ZHOU Jian, HU Yuting, ZHENG Wenming. Whisper to normal speech conversion using deep convolutional neural networks[J]. ACTA ACUSTICA, 2020, 45(1): 137-144. DOI: 10.15949/j.cnki.0371-0025.2020.01.017

Citation:

Whisper to normal speech conversion using deep convolutional neural networks

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Whisper is a special phonation mode.Whisper to normal speech conversion is the key method to improve the quality and intelligibility of whisper.We proposed a Deep Convolutional Neural Networks (DCNN) which can make full use of the correlation between frequency domain and time domain of speech for whisper conversion.Its convolutional layer was used to extract the correlation features between frequency domain and time domain of spectral envelope of consecutive frames,while the fully connected layer was used to fit the mapping function between whisper features extracted by convolution layer and the corresponding normal speech.Experimental results show that Mel Cepstral Distance (CD) of the converted speech decreases 4.64%,while Perceptual Evaluation of Speech Quality (PESQ),ShortTime Objective Intelligibility (STOI) and Mean Opinion Score (MOS) increase 5.41%,5.77%,and 9.68%respectively.

FullText(HTML)

References (0)

Cited By

Whisper to normal speech conversion using deep convolutional neural networks

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content