摘要
通过实验对网页结构和特点进行综合分析,给出对网页分块的原则和方法,在分块的基础上根据网页中噪音的出现规则提出了一种消除网页噪音的方法,使搜索引擎对网页的预处理阶段有效消除网页中的无关项和间接项的超连接,从而大大提高了搜索引擎的检索质量。
Through the comprehensive analysis of the Web page structure and features, the principles and methods for web segmentation are provided. Based on the patterns of web noise in web segmentation, a noise elimination method is also given, which can effectively eliminate the hyperlinks to irrelevant and indirect items, thus greatly enhancing the retrieval quality of search engines.
基金
河北省教育厅科学研究计划项目(项目编号:2008202)
关键词
检索质量
分块模型
搜索引擎
retrieval quality
segmentation model
search engine