感知听觉场景分析的说话人识别
Perception auditory scene analysis for speaker recognition
-
摘要: 针对低信噪比说话人识别中缺失数据特征方法鲁棒性下降的问题,提出了一种采用感知听觉场景分析的缺失数据特征提取方法。首先求取语音的缺失数据特征谱,并由语音的感知特性求出感知特性的语音含量。含噪语音经过感知特性的语音增强和对其语谱的二维增强后求解出语音的分布,联合感知特性语音含量和缺失强度参数提取出感知听觉因子。再结合缺失数据特征谱把特征的提取过程分解为不同听觉场景进行区分地分析和处理,以增强说话人识别系统的鲁棒性能。实验结果表明,在-10 dB到10 dB的低信噪比环境下,对于4种不同的噪声,提出的方法比5种对比方法的鲁棒性均有提高,平均识别率分别提高26.0%,19.6%,12.7%,4.6%和6.5%。论文提出的方法,是一种在时-频域中寻找语音鲁棒特征的方法,更适合于低信噪比环境下的说话人识别。Abstract: For the decreasing robustness of missing data features method of speaker recognition in low-SNRs environ- ment, a missing data features extraction method based on Perception Auditory Scene Analysis is proposed. Missing data features spectrum is calculated firstly. And perception speech content is solved by speech perception characteristic. After speech enhancement based on auditory perceptual characteristic and a 2 dimension enhancement for spectrogram, speech distribution is obtained from noisy speech, which is combined with perception speech content and missing inten- sity parameter to extract Perception Auditory Factor. Perception Auditory Factor and missing data features spectrum resolve the features extraction process into different auditory scenes, which are treated respectively in order to improve robustness of speaker recognition system. Experimental results show that, the proposed method improves the robustness to other five methods in four different noisy low-SNRs environments from -10 dB to 10 dB. The average recognition rates of the proposed method increase 26.0%, 19.6%, 12.7%, 4.6% and 6.5% respectively. The proposed method is to find out the robust features in time- frequency domain, and more suitable for speaker recognition in low-SNRs environment.