期刊文献+

基于概念分析的主题爬虫设计 被引量:11

Design of Theme Crawler Based on Concept Analysis
在线阅读 下载PDF
导出
摘要 研究改进主题爬虫设计的方法,用高效的主题爬虫取代传统搜索引擎中的普通爬虫,以更高的精度完成定向信息采集.在成功实现基于关键词的主题爬虫的基础上,提出了基于概念的主题相关度分析算法,给出了基于概念分析的主题爬虫的实现方案.比较两种主题爬虫工作的实验结果,显示爬虫的性能得到了提高,论证了该设计的可行性与可操作性,为实现准确的定向信息采集奠定了良好的基础. Method of improving the design of theme crawler is discussed. A more efficient theme crawler can replace the normal crawler in the traditional search engine to complete directional information collection with higher precision. With the success of the theme crawler based on
出处 《北京理工大学学报》 EI CAS CSCD 北大核心 2004年第10期890-893,共4页 Transactions of Beijing Institute of Technology
基金 扬州万方电子技术有限责任公司合作项目(2003.08)
关键词 搜索引擎 主题爬虫 概念分析 相关度 信息采集 , an algorithm for computing the degree of correlativity based on concept analysis is proposed. A realization scheme of the theme crawler based on concept analysis is also provided. The experimental result implies improvements in the crawler's performance. The feasibility and the maneuverability are testified. It laid groundwork for exact directional information collection.Key words: search engine theme crawler concept analysis degree of correlativity information collection
  • 相关文献

参考文献2

  • 1[5]Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the Web[EB/OL]. http:∥www-db.stanford.edu/~backrub/pageranksub.ps,1998-01-20/2003-03-25.
  • 2曹军.Google的PageRank技术剖析[J].情报杂志,2002,21(10):15-18. 被引量:70

二级参考文献8

  • 1R. Baeza Yates, B. Ribeiro Neto. Modern Information Retrieval ACM Press,1998
  • 2Google inc. http: //www. google. com
  • 3Dell Zhang, Yisheng Dong. An Efficient Algorithm to Rank Web Resources.The 9th International World Wide Web Conference, 2000. http: //www9. org/w9cdrom/251/251. html
  • 4Jon Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 1999;46(5)
  • 5L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing order to the Web. http://www - db. stanford. edu/~ backrub /pageranksub.ps, January, 1998.
  • 6S. Brin, L. Page The Anatomy of a Large- scale Hypertextual Web Search Engine Computer Networks and ISDN Systems, 1998
  • 7Arvind Arasu, Junghoo Cho. Hector Garcia - Molina, Andreas Paepcke, Sriram Raghavan. Searching the Web. ACM Transactions on Intemet Technology,2001 ;1(1)
  • 8Taher Haveliwala. Effcient Computation of Pagerank. Technical Report 1999 -31, Database Group, Computer Science Department, Stanford University,February 1999. http: //dbpubs. stanford. edu /pub/1999 - 31.

共引文献69

同被引文献122

引证文献11

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部