A model for speech recognition based on joint modeling of frame-based and segmental features
-
Graphical Abstract
-
Abstract
This paper presents a model for speech recognition based on the joint modeling of the frame-based andsegmental features.The new model explicitly models the correlation among successive frames of speech signals onsegment scale by using segmental features representing contours of spectral parameters.By using a proposed segmentalfeature dependent non-stationary time series model,the new model not only achieves the modeling of correlation betweenframe-based features and segmental features,but also implicitly models the correlation among neighboring frames onframe scale via parametric mean trajectory function.In this paper,a modified Viterbi algorithm based on joint statisticaldistance of frame-based and segmental features is proposed,and an algorithm with embedded EM iteration for estimatingthe model parameters is also proposed in the training.Experimental results on a speaker independent isolated mandarinfinal database and a multi-speaker isolated mandarin base syllable database show that the new model achieves betterperformance than the standard HMM and the trended HMM.
-
-