摘要
随着Web信息迅猛发展,网络用户对网页自动分类器的需求日益增长。为了提高分类精度,本文提出了一种新的基于投影寻踪(ProjectionPursuit,简称PP)的中文网页分类算法。我们首先利用遗传算法找到一个最好的投影方向,然后将已被表示成为n维向量的网页投影到一维空间。最后采用KNN分类算法对其进行分类。此方法能解决“维数灾难”问题。实验结果表明,我们提出的算法是可行而且是有效的。
With the rapid growth of the World Wide Web (www), there is an increasing need to provide automated classifier to Web users for Web page classification and categorization. In this paper, we propose a new Web-page classification algorithm based on projection pursuit for improving the accuracy. We first seek the best projection direction using the genetic algorithm, and the Web-document (represent by n-dimension vector) is projected to One-dimension space. Then classify the Web-document using classical KNN (k-nearest neighbor) algorithm. This method can overcome the curse of dimensionality. Experimental results show that our proposed algorithm is feasibility and effectiveness.
出处
《中文信息学报》
CSCD
北大核心
2005年第4期60-67,共8页
Journal of Chinese Information Processing
基金
教育部重点科技资助项目(03070)
江西省自然科学基金资助项目(0311041)
江西师范大学青年成长基金资助项目(1090)
关键词
计算机应用
中文信息处理
投影寻踪
网页分类
遗传算法
KNN算法
computer application
Chinese information processing
projection pursuit
Webpages classification
genetic algorithm
KNN algorithm