半监督判别分析的跨库语音情感识别

金赟; 宋鹏; 郑文明; 赵力

doi:10.15949/j.cnki.0371-0025.2015.01.003

半监督判别分析的跨库语音情感识别

Cross corpus speech emotion recognition using semi-supervised discriminant analysis

摘要

摘要: 针对训练样本与测试样本来自不同语音情感数据库造成特征向量空间分布不匹配的问题,采用半监督判别分析减小二者的差异。首先寻找有标签的训练样本和来自另一个库的部分无标签训练样本之间的最优投影方向。基于一致性假设即相近的点更有可能具有相同的类别,利用p近邻图对无标签训练样本相近点之间的关系进行建模,从而获得无标签样本的分布信息。在保证无标签样本间流形结构的同时,使所有训练样本类间散度和类内散度的比值达到最大,从而得到最优的投影方向。采用两组实验进行验证,第1组用eNTERFACE库训练去测试Berlin库,识别率为51.41%,第2组用Berlin库训练测试eNTERFACE库,识别率为45.76%,相比未采用半监督判别分析的识别结果分别有了13.72%和22.81%的提高,说明该算法的有效性。通过实验前后数据的可视化分析,说明利用半监督判别分析确实减小了不同库之间特征向量空间分布的不匹配问题,从而提高跨库语音情感识别率。

Abstract: In order to solve the problem of feature vector distribution mismatch between training samples and testing samples from different speech emotion corpus, semi-supervised discriminant analysis is adopted to reduce such mismatch. Firstly, the optimal project direction of the labeled training samples from one corpus and some unlabeled training samples from another corpus should be determined. With the consistence assumption that the closer points are more likely to be the same class, the relationship among the close points is modeled using p nearest neighbor graph to obtain the distribution information of the unlabeled samples. The ratio between intra-class scatter matrix and inter- class scatter matrix is maximized and the manifold consistence of unlabeled training sample is kept as well. Then the optimal projection vector is obtained. Two classification experiments are carried out. Firstly, eNTERFACE corpus is for training and Berlin corpus is for testing, and the recognition rate is 51.41%. Secondly, Berlin corpus is for training and eNTERFACE corpus is for testing, and the recognition rate is 45.76%. Comparing to the results with directly classification, the recognition rates are inlproved by 13.72% and 22.81% respectively, which demonstrates the effectiveness of our proposed method. Through the visualization analysis to the data before and after experiments, it is observed that the mismatch between the samples from different corpus is reduced and the recognition rate is enhanced.

HTML全文

参考文献(0)

施引文献

资源附件(0)