期刊文献+

面向超大数据集的SVM近似训练算法 被引量:2

Approximate Approach to Train SVM on Very Large Data Sets
在线阅读 下载PDF
导出
摘要 标准SVM学习算法运行所需的时间和空间复杂度分别为O(l3)和O(l2),l为训练样本的数量,因此不适用于对超大数据集进行训练。提出一种基于近似解的SVM训练算法:Approximate Vector Machine(AVM)。AVM采用增量学习的策略来寻找近似最优分类超平面,并且在迭代过程中采用热启动及抽样技巧来加快训练速度。理论分析表明,该算法的计算复杂度与训练样本的数量无关,因此具有良好的时间与空间扩展性。在超大数据集上的实验结果表明,该算法在极大提高训练速度的同时,仍然保持了原始分类器的泛化性能,并且训练完毕具有较少的支持向量,因此结果分类器具有更快的分类速度。 Standard Support Vector Machine (SVM) training has O(l^3) time and O(l^2) space complexities,where l is the training set size. It is thus computationally infeasible on very large data sets. A novel SVM training method, Approximate Vector Machine (AVM),based on approximate solution was presented to scale up kernel methods on very large data sets. This approach only obtains an approximately optimal hyper plane by incremental learning, and uses probabilis- tic speedup and hot start tricks to accelerate training speed during each iterative stage. Theoretical analysis indicates that AVM has the time and space complexities that are independent of training set size. Experiments on very large data sets show that the proposed method not only preserves the generalization performance of the original SVM classifiers, but outperforms existing scale-up methods in terms of training time and number of support vectors.
出处 《计算机科学》 CSCD 北大核心 2009年第11期208-212,共5页 Computer Science
基金 国家自然科学基金(60773177) 福建省青年人才项目(2008F3108) 厦门理工学院引进人才项目(YKJ08003R)资助
关键词 支持向量机 核函数 增量学习 近似解 核心集 Support vector machine, Kernel function, Incremental learning, Approximate solution, Core set
  • 相关文献

参考文献11

  • 1Vapnik V. Statistical Learning Theory[M]. New York: Wiley, 1998.
  • 2Yu H, Yang J, Han J. Classifying large data sets using SVM with hierarchical clusters[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington DC, USA, 2003 : 306-315.
  • 3Erdem Z,Polikar R, Gurgen F S, et al. Ensemble of SVMs for Incremental Learning. In Multiple Classifier Systems, 2005 : 246- 256.
  • 4Tsang I W, Kwok J T, Cheung P. Core vector machines : Fast svm training on very large data sets[J]. JMLR, 2005,6 : 363-392.
  • 5朱永生,王成栋,张优云.二次损失函数支持向量机性能的研究[J].计算机学报,2003,26(8):982-989. 被引量:8
  • 6Badoiu M, Clarkson K. Optimal core - sets for balls [ C ]// DI - MAC, S Workshop on Computational Geometry. 2002.
  • 7Har-Peled S, Roth D, Zimak 13. Maximum Margin Coresets for Active and Noise Tolerant Learning [ C] // Proceedings of the twentieth International joint Conference on Artificial Intelligence. Hyderabad, India, 2007.
  • 8Li Xuehun, Zhu Yan, Sung E. Sequential bootstrapped support vector machines [J]. IEEE Tran. Neural Netw, 2005, 10 (5) : 1000-1017.
  • 9Chang C-C, Lin C-J. LIBSVM: a library for support vector machines[OL]. 2001. http: //www. csie. ntu. edu. tw/? cjlin/ libsvm.
  • 10Murphy P M, Aha D W. UCI repository of machine learning databases. Irvine,CA[OL]. http: // www. ies. uci. edu/- mlearn/ MLRepository. html), 2004.

二级参考文献10

  • 1Vapnike V N. The Nature of Statistical Learning Theory.New York : Springer-Verlag, 1998.
  • 2Chapelle O, Vapnik V N, Bousquet O etal. Choosing multiple parameters for support vector machines. Machine Learning,2002, 46(1) :131-159.
  • 3Duan K, Keerrthi S S, AN Poo. Evaluation of simple performance measures for tuning SVM hyperparameters. Department of Mechanical Engineering, National University of Singapore:Technical Report, Control Division Technical Report CD-01-11, 2001.
  • 4Keerrthi S S. Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. Department of Mechanical Engineering, National University of Singapore:Technical Report, Control Division Technical Report CD-01-12, 2001.
  • 5Vapnik V N, Chapelle O. Bounds on error expectation for support vector machine. Neural Computation, 2000,12 (9) : 2013-2036.
  • 6Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2) :121-167.
  • 7Scholkopf B, Mika S, Burges CJ C etal. Input spaces vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 1999,10(5) : 1000- 1017.
  • 8Burges C J C, Scholkopf B. Improving the accuracy and speed of support vector machines. Neural Information Processing Systems, 1997,9(7) :375-381.
  • 9Suykens J A K, Lukas L, Vandewalle J. Sparse least squares support vector machine classifiers. In: Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS 2000), Geneva, Switzerland, 2000. 2757-2760.
  • 10张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1):32-42. 被引量:2288

共引文献7

同被引文献34

  • 1Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
  • 2Tenenbaum J B,De Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
  • 3Donoho D L,Grimes C.Hessian eigenmaps:locally linear embedding techniques for high-dimensional data[J].Proceedings of the National Academy of Sciences,2003,100(10):5591-5596.
  • 4Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of Advances in Neural Information Processing Systems,2001:585-591.
  • 5Gómez-Chova L,Camps-Valls G,Munoz-Mari J,et al.Semisupervised image classification with Laplacian support vector machines[J].Geoscience and Remote Sensing Letters,2008,5(3):336-340.
  • 6Belkin M,Niyogi P,Sindhwani V.Manifold regularization:a geometric framework for learning from labeled and unlabeled examples[J].The Journal of Machine Learning Research,2006,7(1):2399-2434.
  • 7Kim K I,Steinke F,Hein M.Semi-supervised regression using hessian energy with an application to semi-supervised dimensionality reduction[C]//Proceedings of Advances in Neural Information Processing Systems,2009:979-987.
  • 8Liu Weifeng,Tao Dacheng.Multiview Hessian regularization for image annotation[J].Image Processing,2013,22(7):2676-2687.
  • 9Liu Weifeng,Tao Dacheng,Jun Cheng.Multiview Hessian discriminative sparse coding for image annotation[J].Computer Vision and Image Understanding,2014,118(1):50-60.
  • 10Zhang J,Jin R,Yang Y,et al.Modified Logistic regression:an approximation to SVM and its applications in large-scale text categorization[C]//Proceedings of ICML,2003:888-895.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部