一种面向海量数据的spilt-and-conquer方法

A Spilt-and-Conquer Method for Massive Data

下载PDF

导出

摘要 Lasso是一种能很好进行变量选择的方法,已被广泛应用。但面对高维海量数据集的时候会存在计算机消耗过大的情况。针对这种情况,文章提出一种spilt-and-conquer方法。首先把高维数据集均分成K份,进行变量选择,把每份选择出来的特征集进行合并后再进行变量选择。为了验证方法的优越性,使用了六组数据集进行实验。最后通过SVM、随机森林、神经网络的预测结果表明,spilt-and-conquer方法,在处理高维海量数据时具有很好的特性,并很大程度上节省了运行时间。 Lasso has been widely applied as one good method for variable selection. But for the high-dimensional massive data sets, there will be too much computer consumption. In view of this situation, this paper proposes the spilt-and-conquer method, in which the high-dimensional data sets are divided into K parts, and then variables are selected to merge each selected feature set before selecting variables. In order to verify the superiority of the proposed method, the paper uses six sets of data for experiments. Finally, the paper employs SVM, random forest and neural network to make a prediction, which shows that the spilt-and-conquer method has good performance in processing high dimensional mass data and also saves running time to a great extent.

作者温焜兰晓然 Wen Kun;Lan Xiaoran(School of Management,Nanchang University,Nanchang 330029,China;Jiangxi Administration Institute,Nanchang 330003,China;,Cangzhou Central Sub-branch of People＇s Bank of China,Cangzhou Hebei 061000,China)

机构地区南昌大学管理学院江西行政学院中国人民银行沧州市中心支行

出处《统计与决策》 CSSCI 北大核心 2018年第16期74-76,共3页 Statistics & Decision

关键词 spilt-and-conquer方法变量选择高维数据 spilt-and-conquer method variable selection high-dimensional data

分类号 O212.1 [理学—概率论与数理统计]

引文网络
相关文献

1罗泽鹏,范峰,高宇航.高维数据集在卷积神经网络中的应用研究[J].无线互联科技,2018,15(13):95-98.
2王玉凤.初中语文教学中创设情景的策略研究[J].中学课程辅导（上旬刊）,2018(13):68-68.
3王国政,傅迎华,张生.双聚类算法SMR在图像聚类中的应用[J].软件导刊,2018,17(7):223-226.
4刘琪琛,雷景生,郝珈玮,黄燕刚,李强,罗海波.基于Spark平台和并行随机森林回归算法的短期电力负荷预测[J].电力建设,2017,38(10):84-92. 被引量：31
5任燕.基于MapReduce与距离的离群数据并行挖掘算法[J].计算机系统应用,2018,27(2):151-156. 被引量：4
6张鑫,吴海涛,曹雪虹.Hadoop环境下基于随机森林的特征选择算法[J].计算机技术与发展,2018,28(7):88-92. 被引量：1
7秦晔玲,朱建平.基于自适应Lasso变量选择方法的指数跟踪[J].统计与决策,2018,0(16):141-145. 被引量：8
8刘晨赫,刘小晴,刘青,苏蕉,杨楠,肖林.针对高维数据的动态网格子空间聚类算法HDGCLUS[J].小型微型计算机系统,2018,39(9):1895-1899. 被引量：3
9林楚海.中国雾霾污染的空间计量分析[J].统计与决策,2018,0(16):94-99. 被引量：15
10杨晶东,张朋.基于迁移学习的全连接神经网络舌象分类方法[J].第二军医大学学报,2018,39(8):897-902. 被引量：18

统计与决策

2018年第16期

浏览历史

内容加载中请稍等...

一种面向海量数据的spilt-and-conquer方法

相关作者

相关机构

相关主题

浏览历史