静音掩蔽和频域分段的音频指纹检索算法

陈树丽; 张学帅; 张鹏远; 刘建

doi:10.15949/j.cnki.0371-0025.2022.04.011

静音掩蔽和频域分段的音频指纹检索算法

Audio fingerprint retrieval method using anti-fingerprint and frequency domain segmentation

摘要

摘要: 为解决背景音及噪音等条件下音频检索识别率低的问题,提出静音掩蔽和频域分段的音频指纹检索算法。首先采用端点检测技术进行语音预处理,将有效语音帧重新组合并利用相邻子带能量差对其提取指纹特征,可有效解决静音帧指纹特征不鲁棒的问题。然后在检索匹配时根据不同音频信号在频域范围内的分布特点,对音频指纹在不同频率区间进行分段和加权,以更精确地计算模板和待检音频之间的相似度。实验表明,与Philips基线算法相比,所提算法在检索速度上提升了一倍,在受背景音等干扰的数据集上,平均准确率与召回率分别绝对提升17.94%和4.66%;与最新Philips算法相比,平均准确率与召回率分别绝对提升13.68%和2.45%。

Abstract: The recognition rate of the audio retrieval algorithm is often significantly reduced under unclean interference conditions such as background music and noise,an audio fingerprint retrieval algorithm based on the mute masking and frequency segmentation is proposed to mitigate this problem.Firstly,the voice activity detection technology is used to remove the non-valid speech frames,the valid speech frames are then recombined and extracted features according to the difference of the adjacent sub-band energy,which can effectively solve the problem that silence frame fingerprint characteristics are not robust.During the search matching stage,the non-uniform frequency segmentation and weighted method computed by the distribution characteristics of different audio signals is applied on the audio fingerprint features.These transformed features are more discriminative between the template audio and the test audio.Experiments show that compared with the classic Philips baseline algorithm,the proposed algorithm doubles the retrieval speed.At the meantime,it yields a large definite improvement over Philips by 17.94% on mean average precision and 4.66% on recall rate respectively for the data set disturbed by background sounds.Compared with the latest Philips algorithm,the average accuracy rate and recall rate are definitely increased by 13.68% and 2.45% respectively.

HTML全文

参考文献(0)

施引文献

资源附件(0)