Feature selection in phonotactic language recognition system
-
Graphical Abstract
-
Abstract
Two feature selection methods of Information Gain(IG) and Weighted Log Likelihood Ratio(WLLR) are introduced into phonotactic language recognition to reduce the dimensions of feature vectors.Together with the traditional Mutual Information(MI) and χ2-test(CHI),the proposed methods are compared on the NIST 2009 Language Recognition Evaluation(LRE) task.Different subsets of features are selected from the total N-gram,respectively according to the four criteria,as the input feature vectors of the classifier for language recognition.The experimental results show that IG and WLLR can obtain much lower dimensional feature vectors without affecting the language recognition performance even giving better performance than the system with all features.And when the number of selected features is very small,IG and WLLR achieve better performance than the existed MI and CHI criteria.The results indicate that IG and WLLR can effectively reduce the number of features and improve the system to some extent.
-
-