YIN Hua, HU Yuping. An Imbalanced Feature Selection Algorithm Based on Random Forest[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2014,53(5):59-65.
YIN Hua, HU Yuping. An Imbalanced Feature Selection Algorithm Based on Random Forest[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2014,53(5):59-65.DOI:
High-dimensional and imbalance data is a challenge for data mining. Balanced class distribution hypothesis leads to unsatisfied results of traditional feature selection algorithms on imbalanced data. For solving this problem
a new imbalanced feature selection algorithm IBRFVS
which uses the variable selection mechanism embedded in random forest
is constructed. IBRFVS construct vary decision trees on the balanced sampling data and get the feature importance measurements of individual decision tree by cross validation. The features importance list is decided by the weighted average of the decision tree weights and feature importance measurements
and the decision tree weights is decided by the consistent degree of the individual decision prediction and ensemble prediction. The random forest hyper parameter selection and preprocessing compare experiments on UCI dataset show that the performance of IBRFVS is more stable and prior than traditional feature selection algorithms when hyper parameter K is the square root of feature number