Whisper to normal conversion based on low dimension feature mapping
-
Graphical Abstract
-
Abstract
In order to characterize the relationship between whisper and its corresponding normal speech for whisper to normal speech conversion, the low dimension features of spectrum envelope in whisper and normal speech are extracted and represented by a sparse auto-encoder. In the low dimension space, two BP networks are then trained. One is used to model the spectrum relation between the whisper and its corresponding normal speech and the other is used to model the relation between the whisper spectrum and the pitch of normal speech. In the conversion stage, the spectral envelope of whisper is sparsely encoded to obtain low dimension spectral envelope feature. The low dimension normal speech feature and pitch are then estimated respectively through the trained BP networks. With sparse decoding, the envelope spectrum of normal speech is then obtained and used to reconstruct the normal speech. Experimental results show that the ceptral distance of the normal speech estimated by the proposed method decreases 10% compared with that of the GMM-based method. Subjective listening tests also show better naturalness and intelligibility obtained by the proposed method.
-
-