摘要
针对WPR(Weighted PageRank)算法存在的在网页搜索方面的主题漂移和偏重旧网页的现象,综合网页的主题特征和最近搜索周期网页的被引用频率两个因素,提出了一种改进的算法WTFPR(Weighted Topic Frequency PageRank)。该算法通过内容分析,采用改进的TD-IDF算法来解决网页相关性,改善主题漂移现象;通过网页的最近搜索周期的被引用频率来提高那些较新而且价值较高的网页的PR值,从而改善偏重旧网页的现象。仿真结果表明,改进后的算法与WPR算法相比获得了更好的效果。
For the topic drift and bias towards the old pages of WPR(Weighted PageRank)algorithm exist in the Web search,consolidated two factors of Web pages' topic features and referenced frequency in recent search cycle,we proposed an improved algorithm WTFPR(Weighted Topic Frequency PageRank).The algorithm uses improved TD-IDF algorithm to solve relevance of page by content analysis to reduce the topic drift.The algorithm improves the PR value of new and has high quality by referenced frequency of pages in recent search cycle,reducing bias towards the old pages.Simulation results show that the improved algorithm obtaines better results compared to WPR.
出处
《计算机科学》
CSCD
北大核心
2016年第2期86-88,共3页
Computer Science
关键词
主题特征
被引用频率
偏重旧网页
搜索周期
主题漂移
Topic features
Referenced frequency
Bias towards the old pages
Search cycle
Topic drift