一种基于网页分割的Web信息检索方法被引量：3

Information Retrieval Method based on Page Segmentation

导出

摘要提出一种基于网页内容分割的Web信息检索算法。该算法根据网页半结构化的特点,按照HTML标记和网页的内容将网页进行区域分割。在建立HTML标记树的基础上,利用内容相似性和视觉相似性进行节点的整合。在检索和排序中,根据用户的查询,充分利用区域信息来对相关的检索结果进行排序。 A Web information retrieval algorithm based on web page segment is designed. The key idea is to segment each web page into different topic areas or segments according to its HTML tags and contents since web pages are semi-structure. First the algorithm builds a HTML tag tree. Then it combines nodes in the tree by using both the content similarity and visual similarity. The retrieval and ranking algorithm makes use of this segmentation information to search and order the relevant pages.

作者俞扬信严云洋

机构地区淮阴工学院计算机工程系

出处《图书情报工作》 CSSCI 北大核心 2009年第3期108-110,114,共4页 Library and Information Service

基金淮安市科技计划项目"基于Web级科技计划项目管理系统"(项目编号:HAG08081)研究成果之一

关键词网页分割信息检索 HTML标记相似性 page segment information retrieval HTML tag similarity

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1俞扬信.基于知识推理的语义信息检索研究[J].情报杂志,2008,27(11):78-80. 被引量：10
2宋玲玲,李村合.基于链接结构分析的Web信息检索方法研究[J].现代情报,2007,27(2):133-135. 被引量：7
3朱征宇,苑昆峰,陈杏环.一种基于最大权匹配计算的信息检索方法[J].计算机工程与应用,2007,43(33):176-180. 被引量：6
4Park J S, Chen M S, Yu P S. An effective hashbased algorithm for mining association rules. Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose: CA, 1995 : 175 - 186.
5刘亚军,徐易.一种基于加权语义相似度模型的自动问答系统[J].东南大学学报（自然科学版）,2004,34(5):609-612. 被引量：36

二级参考文献31

1林培光,刘弘,樊孝忠,王涛.New method for query answering in semantic web[J].Journal of Southeast University(English Edition),2006,22(3):319-323. 被引量：1
2龚劬.图论与网络最优化算法[M].重庆:重庆大学出版社,2000.87-96.
3Franz Baader, Diego Calvanese, Deborah McGuinness, et al. The Description Logic Handbook [ M ]. Cambridge University Press, 2003 : 189 - 212
4U Straecia. Reasoning Within Fuzzy Description Logics[J ]. Journal of Artificial Intelligence Research, 2001,14 : 323 - 328
5Brian McBride. Jena. A Semantic Web Toolkit [J ]. IEEE Internet Computing, 2002,6 (6) : 55 - 59
6Aleman - Meza B. SWETO. Large - scale Semantic Web Test bed [A]. Proceedings of the16th International Conference on Software Eng &Knowledge Eng (SEKE2004) :Workshop on Ontology in Action. Banff, Canada. Knowledge Systems Inst, 2004 : 490 - 493
7Kerschberg Larry, Kim Wooju, Scime Anthony. A Personalizable Agent for Semantic Taxonomy - Based Web Search [M]. Springer Berlin:Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)[ J ]. Innovative Concepts for Agent-Based Systems, 2003 : 3 - 31
8Voorhees E. The TREC-8 question answering track report[A]. In: Proceedings of the 8th Text Retrieval Conference NIST[C]. Gaithersburg, MD, 1999. 77-82.
9Katz B, Lin J, Felshin S. Gathering knowledge for a question answering system from heterogeneous information sources [A]. In: Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management[C]. Toulouse, France. 2001.
10张德.[D].南京:东南大学计算机科学与工程系,2002.

共引文献52

1李志辉,周竹荣.基于领域知网的中文智能答疑系统[J].四川理工学院学报（自然科学版）,2005,18(4):86-89. 被引量：2
2张以利,刘亚军.分布式智能答疑系统的知识库构建与维护研究[J].计算机技术与发展,2006,16(7):15-16. 被引量：1
3张以利.基于xml的分布式智能答疑系统知识库构建研究[J].内江科技,2007,28(1):120-121. 被引量：1
4梅翔,孟祥武,陈俊亮,徐萌.SSCM：一种语义相似度计算方法[J].高技术通讯,2007,17(5):458-463. 被引量：7
5夏天.汉语词语语义相似度计算研究[J].计算机工程,2007,33(6):191-194. 被引量：63
6梁正平,纪震,刘小丽.基于语义模板的问答系统研究[J].深圳大学学报（理工版）,2007,24(3):281-285. 被引量：6
7张以利.匈牙利算法在主观题自动批阅中的应用研究[J].南京工业职业技术学院学报,2007,7(2):73-75. 被引量：7
8杨思春,陈家骏.中文自动问答中句子相似度计算研究[J].情报学报,2008,27(1):35-41. 被引量：5
9戴伟.情报学视角与社会学视角的链接分析比较[J].中国科技资源导刊,2008,40(3):21-25. 被引量：1
10苏小虎,杨思春.基于改进VSM的中文问答系统研究[J].情报理论与实践,2008,31(4):624-627. 被引量：3

同被引文献18

1刘波涛.基于WEB信息检索方法研究[J].湖南科技学院学报,2006,27(11):244-246. 被引量：1
2Zhu Lijun, Tao Lain, Liu Hui. Caeulation of the Concept Similarity on Domain Ontology [J ]. Journal of South China University of Technology( Natural Soienee FAition), 2004, 32(11) : 147 - 159.
3Liu Yajun, Xu Yi. Automatic Question Answering System Based on Weighted Semantic Similarity Model[J ]. Journal of Southeast University (Natural Science Edition), 2004,34 (5) :609 - 612.
4Rodriguez M, Egenhofer M. Determining Semantie Similarity Among Entity Class From Different Ontologies[J ]. IEEE Transactions on Knowledge and Data Engineering,2003,15(2) :442 - 456.
5Ganesan P. Exploiting Hierarchical Domain Structure to Compute Similarity[J ]. ACM Transactions on Information System, 2003,21 (1):64 -93.
6Hun Lixin, Sun Linping. An Approach to Determining Semantie Similarity[ J ]. Advances in Engineering Software, 2006,37 (2) : 129 - 132.
7MA Zhong-ming,Gautam Pant,Sheng Olivia R.Interest-based personalized search[A] //ACM Transactions on Information Systems[C].NewYork,2007.
8Pretschner A,Gauch S.Ontology based personalized search[A].Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence[C].Chicago,U S:IEEE Press,1999:391-398.
9Joachims T,Freitag D,Mitchell T.WebWatcher:a tour guide for the World Wide Web[A].In:Georgeff,MP,Pollack,E.M,eds.Proceedings of the International Joint Conference on Artificial Intelligence[C].San Francisco:Morgan Kanfmann Publishers,1997:770-777.
10Barratt Rob,Maglio Paul P,Kellem Daniel C.How to personalize the Web[A].In Proc.ACM CH197[C].Atlanta,USA,1997.

引证文献3

1俞扬信.基于语义相似度的信息检索研究[J].情报杂志,2009,28(9):172-175. 被引量：12
2蔺跟荣.基于用户兴趣的个性化Web信息检索方法[J].电子设计工程,2010,18(7):60-62. 被引量：2
3俞扬信.基于扩展模糊概念网的信息检索结果个性化的研究[J].情报学报,2011,30(3):261-267. 被引量：2

二级引证文献16

1俞扬信.基于多相关本体的模糊信息检索模型[J].计算机工程,2010,36(20):68-70. 被引量：3
2俞扬信.基于本体知识库的模糊信息检索研究[J].图书情报工作,2010,54(22):107-110. 被引量：2
3俞扬信,张一洲.基于感性工学的模糊信息检索的信息过滤研究[J].情报杂志,2010,29(12):156-158.
4俞扬信.一种基于语义树的三维模型检索方法[J].情报理论与实践,2011,34(1):107-111.
5俞扬信.个性化网络学习的语义信息检索研究[J].情报学报,2012,31(1):18-22. 被引量：4
6李军.国内语义检索研究计量分析[J].当代图书馆,2011(4):66-68.
7徐桂臣,叶枫.基于语义加权距离的语义相似度改进算法[J].情报杂志,2012,31(2):119-123. 被引量：5
8俞扬信,刘瀛泽.基于概念网的用户个性化信息检索研究[J].情报杂志,2012,31(2):136-140. 被引量：1
9化莉,俞扬信.基于模糊值概念网的模糊信息检索研究[J].苏州科技学院学报（自然科学版）,2012,29(2):49-54.
10任柯,黄智兴,邱玉辉.基于主题模型的跨学科协作文献推荐[J].计算机科学,2012,39(9):235-239. 被引量：10

1沈达峰.基于网页分割的语义信息检索研究[J].西昌学院学报（自然科学版）,2009,23(4):57-61.
2陈明,孙丽丽.基于WAP的移动搜索模型[J].计算机工程,2008,34(3):205-206. 被引量：6
3罗永莲,秦振吉.新闻网页主题内容提取方法研究[J].微计算机应用,2007,28(5):556-560. 被引量：5
4孙晓辉,刘建,王劲林,陈晓.基于CSS的网页分割算法[J].微计算机应用,2008,29(9):46-51. 被引量：4
5彭红超,童名文,邹军华,郝秋红.基于规则的网页分割预处理算法研究[J].计算机科学,2013,40(11A):379-382. 被引量：1
6段昕,马军,宋玲.利用分块重要度进行中文网页分类的研究[J].山东大学学报（理学版）,2006,41(3):1-4.
7侯明燕,杨天奇.基于网页分割的Web信息提取算法[J].微型机与应用,2011,30(5):54-56. 被引量：2
8陈翰生,曾剑平,张世永.一种基于位置信息的Web页面分割方法[J].计算机应用与软件,2009,26(7):155-159. 被引量：3
9余小燕,陆全华.一种欺骗网页检测判定算法[J].成都大学学报（自然科学版）,2009,28(4):332-335.
10于鲁波,陈超.互联网商品信息抽取技术[J].计算机工程,2008,34(5):274-276. 被引量：5

图书情报工作

2009年第3期

浏览历史

内容加载中请稍等...

一种基于网页分割的Web信息检索方法被引量：3

参考文献5

二级参考文献31

共引文献52

同被引文献18

引证文献3

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

一种基于网页分割的Web信息检索方法 被引量：3

参考文献5

二级参考文献31

共引文献52

同被引文献18

引证文献3

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

一种基于网页分割的Web信息检索方法被引量：3