基于随机森林的不平衡特征选择算法

尹华; 胡玉平

您当前的位置：

首页 >

文章列表页 >

基于随机森林的不平衡特征选择算法

研究论文 | 更新时间：2023-12-11

- 基于随机森林的不平衡特征选择算法
- An Imbalanced Feature Selection Algorithm Based on Random Forest
- 中山大学学报(自然科学版)(中英文) 2014年53卷第5期页码：59-65
- 作者机构：
  
  广东财经大学信息学院,广东,广州,510320
- 作者简介：
- 基金信息：
- DOI：
  中图分类号：
- 纸质出版日期：2014，
  
  网络出版日期：2014-9-25，
扫描看全文
尹华, 胡玉平. 基于随机森林的不平衡特征选择算法[J]. 中山大学学报(自然科学版)(中英文), 2014,53(5):59-65.

YIN Hua, HU Yuping. An Imbalanced Feature Selection Algorithm Based on Random Forest[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2014,53(5):59-65.
尹华, 胡玉平. 基于随机森林的不平衡特征选择算法[J]. 中山大学学报(自然科学版)(中英文), 2014,53(5):59-65. DOI：

YIN Hua, HU Yuping. An Imbalanced Feature Selection Algorithm Based on Random Forest[J]. Acta Scientiarum Naturalium Universitatis SunYatseni, 2014,53(5):59-65. DOI：

摘要

数据高维不平衡是当前数据挖掘的挑战。针对传统特征选择方法基于类别平衡假设，导致在不平衡数据上效果不理想的问题，利用随机森林内嵌的变量选择机制，构造了一个新的不平衡随机森林特征选择算法IBRFVS。IBRFVS在平衡的取样数据上构造多样决策树，采用交叉验证方式获取单棵决策树的特征重要性度量值。各决策树的权重和特征重要性度量的加权平均决定了最终的特征重要性序列，其中，决策树的权重由该决策树与集成预测的一致性程度决定。在UCI数据集上的随机森林超参数选择和预处理对比验证实验中显示，四种超参数K经验取值中，当K的取值为特征数的平方根时，IBRFVS性能较为稳定且优于传统特征选择算法。

Abstract

High-dimensional and imbalance data is a challenge for data mining. Balanced class distribution hypothesis leads to unsatisfied results of traditional feature selection algorithms on imbalanced data. For solving this problem

a new imbalanced feature selection algorithm IBRFVS

which uses the variable selection mechanism embedded in random forest

is constructed. IBRFVS construct vary decision trees on the balanced sampling data and get the feature importance measurements of individual decision tree by cross validation. The features importance list is decided by the weighted average of the decision tree weights and feature importance measurements

and the decision tree weights is decided by the consistent degree of the individual decision prediction and ensemble prediction. The random forest hyper parameter selection and preprocessing compare experiments on UCI dataset show that the performance of IBRFVS is more stable and prior than traditional feature selection algorithms when hyper parameter K is the square root of feature number

among four empirical parameters.

关键词

不平衡数据高维数据特征选择随机森林

Keywords

references

浏览量

310

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据