摘要
研究改进主题爬虫设计的方法,用高效的主题爬虫取代传统搜索引擎中的普通爬虫,以更高的精度完成定向信息采集.在成功实现基于关键词的主题爬虫的基础上,提出了基于概念的主题相关度分析算法,给出了基于概念分析的主题爬虫的实现方案.比较两种主题爬虫工作的实验结果,显示爬虫的性能得到了提高,论证了该设计的可行性与可操作性,为实现准确的定向信息采集奠定了良好的基础.
Method of improving the design of theme crawler is discussed. A more efficient theme crawler can replace the normal crawler in the traditional search engine to complete directional information collection with higher precision. With the success of the theme crawler based on
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2004年第10期890-893,共4页
Transactions of Beijing Institute of Technology
基金
扬州万方电子技术有限责任公司合作项目(2003.08)
关键词
搜索引擎
主题爬虫
概念分析
相关度
信息采集
, an algorithm for computing the degree of correlativity based on concept analysis is proposed. A realization scheme of the theme crawler based on concept analysis is also provided. The experimental result implies improvements in the crawler's performance. The feasibility and the maneuverability are testified. It laid groundwork for exact directional information collection.Key words: search engine
theme crawler
concept analysis
degree of correlativity
information collection