Cross corpus speech emotion recognition using semi-supervised discriminant analysis
-
Graphical Abstract
-
Abstract
In order to solve the problem of feature vector distribution mismatch between training samples and testing samples from different speech emotion corpus, semi-supervised discriminant analysis is adopted to reduce such mismatch. Firstly, the optimal project direction of the labeled training samples from one corpus and some unlabeled training samples from another corpus should be determined. With the consistence assumption that the closer points are more likely to be the same class, the relationship among the close points is modeled using p nearest neighbor graph to obtain the distribution information of the unlabeled samples. The ratio between intra-class scatter matrix and inter- class scatter matrix is maximized and the manifold consistence of unlabeled training sample is kept as well. Then the optimal projection vector is obtained. Two classification experiments are carried out. Firstly, eNTERFACE corpus is for training and Berlin corpus is for testing, and the recognition rate is 51.41%. Secondly, Berlin corpus is for training and eNTERFACE corpus is for testing, and the recognition rate is 45.76%. Comparing to the results with directly classification, the recognition rates are inlproved by 13.72% and 22.81% respectively, which demonstrates the effectiveness of our proposed method. Through the visualization analysis to the data before and after experiments, it is observed that the mismatch between the samples from different corpus is reduced and the recognition rate is enhanced.
-
-