摘要
标准SVM学习算法运行所需的时间和空间复杂度分别为O(l3)和O(l2),l为训练样本的数量,因此不适用于对超大数据集进行训练。提出一种基于近似解的SVM训练算法:Approximate Vector Machine(AVM)。AVM采用增量学习的策略来寻找近似最优分类超平面,并且在迭代过程中采用热启动及抽样技巧来加快训练速度。理论分析表明,该算法的计算复杂度与训练样本的数量无关,因此具有良好的时间与空间扩展性。在超大数据集上的实验结果表明,该算法在极大提高训练速度的同时,仍然保持了原始分类器的泛化性能,并且训练完毕具有较少的支持向量,因此结果分类器具有更快的分类速度。
Standard Support Vector Machine (SVM) training has O(l^3) time and O(l^2) space complexities,where l is the training set size. It is thus computationally infeasible on very large data sets. A novel SVM training method, Approximate Vector Machine (AVM),based on approximate solution was presented to scale up kernel methods on very large data sets. This approach only obtains an approximately optimal hyper plane by incremental learning, and uses probabilis- tic speedup and hot start tricks to accelerate training speed during each iterative stage. Theoretical analysis indicates that AVM has the time and space complexities that are independent of training set size. Experiments on very large data sets show that the proposed method not only preserves the generalization performance of the original SVM classifiers, but outperforms existing scale-up methods in terms of training time and number of support vectors.
出处
《计算机科学》
CSCD
北大核心
2009年第11期208-212,共5页
Computer Science
基金
国家自然科学基金(60773177)
福建省青年人才项目(2008F3108)
厦门理工学院引进人才项目(YKJ08003R)资助
关键词
支持向量机
核函数
增量学习
近似解
核心集
Support vector machine, Kernel function, Incremental learning, Approximate solution, Core set