Voice conversion using bayesian analysis and dynamic kernel features
-
Graphical Abstract
-
Abstract
When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.
-
-