EI / SCOPUS / CSCD 收录

中文核心期刊

基于基音参数规整及统计分布模型距离的语音情感识别

Emotional speech recognition based on modified parameter and distance of statistical model of pitch

  • 摘要: 提出一种根据基音提取的频率分辨率确定自适应窗口的改进Parzen窗方法估计基音概率密度,兼顾了基音统计分布模型在低频段的高分辨率和高频段的平滑;提出利用不同性别的基音分布规律的性别区分算法,对于长句可以达到98%的识别率;通过分析基音均值、方差、统计分布模型在性别上的差异,对基音参数进行基于性别差异的规整;引入规整后的基音均值和基音方差,以及基音统计分布模型距离作为情感特征参数;最后利用K最近邻方法对汉语情感语料进行识别。利用常规方法提取的参数最后得到的识别率为73.8%,而使用经过性别差异规整的基音参数和基音统计分布距离的识别率提高到81%。

     

    Abstract: Based on resolution of pitch,a modified Parzen-window method,which can maintain high resolution in low frequencies and eliminate the jitter in high frequencies,is proposed to obtain a statistical model.Then,a gender classification utilizing the statistical model is proposed.Accuracy can achieve 98% while long sentence is to be classified.By analyzing the differences between genders,modified parameters about pitch are proposed,and the following parameters:(1) modified means of pitch,(2) modified standard deviations of pitch,and (3) Bhattacharyya Distance of statistical models of pitch,are utilized for pattern classification.Finally,an emotion recognition experiment based on K Nearest Neighbor is described.A 81% rate of recognition can be achieved when our parameters are utilized;whereas only 73.8% is obtained when normal parameters are utilized.

     

/

返回文章
返回