期刊文献+

一种面向海量数据的spilt-and-conquer方法

A Spilt-and-Conquer Method for Massive Data
在线阅读 下载PDF
导出
摘要 Lasso是一种能很好进行变量选择的方法,已被广泛应用。但面对高维海量数据集的时候会存在计算机消耗过大的情况。针对这种情况,文章提出一种spilt-and-conquer方法。首先把高维数据集均分成K份,进行变量选择,把每份选择出来的特征集进行合并后再进行变量选择。为了验证方法的优越性,使用了六组数据集进行实验。最后通过SVM、随机森林、神经网络的预测结果表明,spilt-and-conquer方法,在处理高维海量数据时具有很好的特性,并很大程度上节省了运行时间。 Lasso has been widely applied as one good method for variable selection. But for the high-dimensional massive data sets, there will be too much computer consumption. In view of this situation, this paper proposes the spilt-and-conquer method, in which the high-dimensional data sets are divided into K parts, and then variables are selected to merge each selected feature set before selecting variables. In order to verify the superiority of the proposed method, the paper uses six sets of data for experiments. Finally, the paper employs SVM, random forest and neural network to make a prediction, which shows that the spilt-and-conquer method has good performance in processing high dimensional mass data and also saves running time to a great extent.
作者 温焜 兰晓然 Wen Kun;Lan Xiaoran(School of Management,Nanchang University,Nanchang 330029,China;Jiangxi Administration Institute,Nanchang 330003,China;,Cangzhou Central Sub-branch of People's Bank of China,Cangzhou Hebei 061000,China)
出处 《统计与决策》 CSSCI 北大核心 2018年第16期74-76,共3页 Statistics & Decision
关键词 spilt-and-conquer方法 变量选择 高维数据 spilt-and-conquer method variable selection high-dimensional data
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部