Whisper to normal speech conversion using deep convolutional neural networks
-
Graphical Abstract
-
Abstract
Whisper is a special phonation mode.Whisper to normal speech conversion is the key method to improve the quality and intelligibility of whisper.We proposed a Deep Convolutional Neural Networks (DCNN) which can make full use of the correlation between frequency domain and time domain of speech for whisper conversion.Its convolutional layer was used to extract the correlation features between frequency domain and time domain of spectral envelope of consecutive frames,while the fully connected layer was used to fit the mapping function between whisper features extracted by convolution layer and the corresponding normal speech.Experimental results show that Mel Cepstral Distance (CD) of the converted speech decreases 4.64%,while Perceptual Evaluation of Speech Quality (PESQ),ShortTime Objective Intelligibility (STOI) and Mean Opinion Score (MOS) increase 5.41%,5.77%,and 9.68%respectively.
-
-