面向超大数据集的SVM近似训练算法被引量：2

Approximate Approach to Train SVM on Very Large Data Sets

下载PDF

导出

摘要标准SVM学习算法运行所需的时间和空间复杂度分别为O(l3)和O(l2),l为训练样本的数量,因此不适用于对超大数据集进行训练。提出一种基于近似解的SVM训练算法:Approximate Vector Machine(AVM)。AVM采用增量学习的策略来寻找近似最优分类超平面,并且在迭代过程中采用热启动及抽样技巧来加快训练速度。理论分析表明,该算法的计算复杂度与训练样本的数量无关,因此具有良好的时间与空间扩展性。在超大数据集上的实验结果表明,该算法在极大提高训练速度的同时,仍然保持了原始分类器的泛化性能,并且训练完毕具有较少的支持向量,因此结果分类器具有更快的分类速度。 Standard Support Vector Machine （SVM） training has O（l^3） time and O（l^2） space complexities,where l is the training set size. It is thus computationally infeasible on very large data sets. A novel SVM training method, Approximate Vector Machine （AVM）,based on approximate solution was presented to scale up kernel methods on very large data sets. This approach only obtains an approximately optimal hyper plane by incremental learning, and uses probabilis- tic speedup and hot start tricks to accelerate training speed during each iterative stage. Theoretical analysis indicates that AVM has the time and space complexities that are independent of training set size. Experiments on very large data sets show that the proposed method not only preserves the generalization performance of the original SVM classifiers, but outperforms existing scale-up methods in terms of training time and number of support vectors.

作者曾志强廖备水高济

机构地区厦门理工学院计算机科学与技术系浙江大学计算机科学与技术学院

出处《计算机科学》 CSCD 北大核心 2009年第11期208-212,共5页 Computer Science

基金国家自然科学基金(60773177) 福建省青年人才项目(2008F3108) 厦门理工学院引进人才项目(YKJ08003R)资助

关键词支持向量机核函数增量学习近似解核心集 Support vector machine, Kernel function, Incremental learning, Approximate solution, Core set

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1Vapnik V. Statistical Learning Theory[M]. New York: Wiley, 1998.
2Yu H, Yang J, Han J. Classifying large data sets using SVM with hierarchical clusters[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington DC, USA, 2003 : 306-315.
3Erdem Z,Polikar R, Gurgen F S, et al. Ensemble of SVMs for Incremental Learning. In Multiple Classifier Systems, 2005 : 246- 256.
4Tsang I W, Kwok J T, Cheung P. Core vector machines : Fast svm training on very large data sets[J]. JMLR, 2005,6 : 363-392.
5朱永生,王成栋,张优云.二次损失函数支持向量机性能的研究[J].计算机学报,2003,26(8):982-989. 被引量：8
6Badoiu M, Clarkson K. Optimal core - sets for balls [ C ]// DI - MAC, S Workshop on Computational Geometry. 2002.
7Har-Peled S, Roth D, Zimak 13. Maximum Margin Coresets for Active and Noise Tolerant Learning [ C] // Proceedings of the twentieth International joint Conference on Artificial Intelligence. Hyderabad, India, 2007.
8Li Xuehun, Zhu Yan, Sung E. Sequential bootstrapped support vector machines [J]. IEEE Tran. Neural Netw, 2005, 10 (5) : 1000-1017.
9Chang C-C, Lin C-J. LIBSVM: a library for support vector machines[OL]. 2001. http: //www. csie. ntu. edu. tw/? cjlin/ libsvm.
10Murphy P M, Aha D W. UCI repository of machine learning databases. Irvine,CA[OL]. http: // www. ies. uci. edu/- mlearn/ MLRepository. html), 2004.

二级参考文献10

1Vapnike V N. The Nature of Statistical Learning Theory.New York : Springer-Verlag, 1998.
2Chapelle O, Vapnik V N, Bousquet O etal. Choosing multiple parameters for support vector machines. Machine Learning,2002, 46(1) :131-159.
3Duan K, Keerrthi S S, AN Poo. Evaluation of simple performance measures for tuning SVM hyperparameters. Department of Mechanical Engineering, National University of Singapore:Technical Report, Control Division Technical Report CD-01-11, 2001.
4Keerrthi S S. Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. Department of Mechanical Engineering, National University of Singapore:Technical Report, Control Division Technical Report CD-01-12, 2001.
5Vapnik V N, Chapelle O. Bounds on error expectation for support vector machine. Neural Computation, 2000,12 (9) : 2013-2036.
6Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2) :121-167.
7Scholkopf B, Mika S, Burges CJ C etal. Input spaces vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 1999,10(5) : 1000- 1017.
8Burges C J C, Scholkopf B. Improving the accuracy and speed of support vector machines. Neural Information Processing Systems, 1997,9(7) :375-381.
9Suykens J A K, Lukas L, Vandewalle J. Sparse least squares support vector machine classifiers. In: Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS 2000), Geneva, Switzerland, 2000. 2757-2760.
10张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1):32-42. 被引量：2288

共引文献7

1胡正平,张晔.基于支持向量信息测度排序的快速分类算法[J].系统工程与电子技术,2005,27(8):1467-1470.
2赵晖,荣莉莉.支持向量机组合分类及其在文本分类中的应用[J].小型微型计算机系统,2005,26(10):1816-1820. 被引量：7
3张伟,胡昌华,焦李成,薄列峰.克隆规划-交叉验证参数优化的LSSVM及惯性器件预测[J].西安电子科技大学学报,2007,34(3):428-432. 被引量：10
4张伟,胡昌华,焦李成.基于混合参数优化的LSSVM与时间序列预测[J].电子测量与仪器学报,2007,21(5):55-59. 被引量：6
5杜喆,刘三阳.直接支持向量机[J].控制与决策,2008,23(8):935-937. 被引量：3
6李小光.混合损失函数支持向量回归机的性能研究[J].西北大学学报（自然科学版）,2011,41(2):210-214. 被引量：6
7王快妮,马金凤,丁小帅.鲁棒最小二乘支持向量回归机[J].计算机应用,2011,31(8):2111-2114. 被引量：2

同被引文献34

1Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
2Tenenbaum J B,De Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
3Donoho D L,Grimes C.Hessian eigenmaps:locally linear embedding techniques for high-dimensional data[J].Proceedings of the National Academy of Sciences,2003,100(10):5591-5596.
4Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of Advances in Neural Information Processing Systems,2001:585-591.
5Gómez-Chova L,Camps-Valls G,Munoz-Mari J,et al.Semisupervised image classification with Laplacian support vector machines[J].Geoscience and Remote Sensing Letters,2008,5(3):336-340.
6Belkin M,Niyogi P,Sindhwani V.Manifold regularization:a geometric framework for learning from labeled and unlabeled examples[J].The Journal of Machine Learning Research,2006,7(1):2399-2434.
7Kim K I,Steinke F,Hein M.Semi-supervised regression using hessian energy with an application to semi-supervised dimensionality reduction[C]//Proceedings of Advances in Neural Information Processing Systems,2009:979-987.
8Liu Weifeng,Tao Dacheng.Multiview Hessian regularization for image annotation[J].Image Processing,2013,22(7):2676-2687.
9Liu Weifeng,Tao Dacheng,Jun Cheng.Multiview Hessian discriminative sparse coding for image annotation[J].Computer Vision and Image Understanding,2014,118(1):50-60.
10Zhang J,Jin R,Yang Y,et al.Modified Logistic regression:an approximation to SVM and its applications in large-scale text categorization[C]//Proceedings of ICML,2003:888-895.

引证文献2

1刘红丽,刘伟锋,王延江,董丽萍.Hessian正则化Logistic回归模型[J].计算机工程与应用,2016,52(5):236-240. 被引量：2
2楼旭明,徐菲.基于SVM的动态物流大数据有效信息提取算法[J].统计与决策,2019,0(14):79-82. 被引量：2

二级引证文献4

1郭黎,廖宇,李敏,袁海林,李军.自适应非局部数据保真项和双边总变分的图像去噪模型[J].计算机应用,2017,37(8):2334-2342. 被引量：3
2王超锋,施俊,吴金杰,朱捷.基于Hessian正则化的多视图联合非负矩阵分解算法[J].计算机工程,2017,43(11):134-139. 被引量：5
3许英姿,任俊玲.基于改进的加权补集朴素贝叶斯物流新闻分类[J].计算机工程与设计,2022,43(1):179-185. 被引量：9
4曹夏琳.基于粗糙集属性依赖度强化的交互式大数据特征分类[J].宁夏师范学院学报,2023,44(1):90-97.

1王书舟,伞冶.支持向量机的训练算法综述[J].智能系统学报,2008,3(6):467-475. 被引量：19
2彭璐,章兢.基于模糊C-均值聚类的支持向量机[J].工业控制计算机,2006,19(11):43-44.
3朱亚辉,黄襄念.SVM方法在模式识别应用领域中的发展与研究[J].现代计算机,2015,21(4):20-24. 被引量：3
4方辉,王倩.支持向量机的算法研究[J].长春师范学院学报（自然科学版）,2007,26(3):90-91. 被引量：13
5童燕,李映,白本督,张艳宁.一种改进的基于粒子群优化的SVM训练算法[J].计算机工程与应用,2008,44(20):138-141. 被引量：9
6杨玉岭,王亚林.大型复杂系统制造的革命性转变——美国AVM项目分析[J].国际航空,2014(6):36-39.
7俞晓,夏卫民.AVM:一个应用级的虚存系统的实现原理[J].计算机工程,1998,24(11):26-28.
8方辉,艾青.支持向量机训练及分类算法研究[J].大庆师范学院学报,2009,29(3):85-88. 被引量：3
9李希婷,孙璐,钱永亮,邹采荣.基于改进混合蛙跳算法的SVM分类算法[J].信息化研究,2011,37(5):41-44. 被引量：6
10王莉,林锦国.支持向量机的发展与应用[J].石油化工自动化,2006,42(3):34-38. 被引量：5

计算机科学

2009年第11期

浏览历史

内容加载中请稍等...

面向超大数据集的SVM近似训练算法被引量：2

参考文献11

二级参考文献10

共引文献7

同被引文献34

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

面向超大数据集的SVM近似训练算法 被引量：2

参考文献11

二级参考文献10

共引文献7

同被引文献34

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

面向超大数据集的SVM近似训练算法被引量：2