摘要
文中引入了CHAMELEON聚类来产生广义实例,采用带回溯的广义实例文本分类算法实现了模型改进和文本分类运算时间的显著提高。对两个语料库文档数据实验中验证表明,改进带回溯算法在两个语料库上都达到了与传统KNN分类算法相同的精度;带回溯的算法执行速度提高了10倍,在语料库上提高了8倍;在Tan语料库上带回溯算法比SVM文本算法精度高出3个百分点。上述研究对信息领域的大数据存储有明显的借鉴意义。
This paper introduces the CHAMELEON clustering to produce generalized instance,the use of backtracking generalized instance of the text classification algorithm to achieve the improved model,to achieve a significantly improved text categorization computation time. Two corpora document data validation experiments show improved backtracking algorithms on two corpora reached the same with the traditional KNN classification algorithm accuracy; backtracking algorithm execution speeds up to 10 times in the corpus increased 8-fold; Tan in corpus higher than SVM algorithm with backtracking algorithm precision text three percentage points. The study of information in the field of big data storage has obvious reference.
出处
《信息技术》
2016年第4期109-113,共5页
Information Technology
关键词
广义实例
实例文本
GIS算法
执行速度
精度
generalized instance
instance text
GIS algorithm
execution speed
accuracy