期刊文献+

基于回归分析的网络恐怖信息主题爬虫 被引量:4

A Network Counter-terrorism Information Crawler Based on the Regression Analysis
原文传递
导出
摘要 [目的/意义]针对目前从开源网络信息中采集网络恐怖信息难、采集效率低的问题,提出一种回归分析法,以综合语义相关与网页重要性两个因素,从而提高网络恐怖信息的采集效率。[方法/过程]通过分析、比较主题爬虫的特性.结合网络恐怖信息的特点,找出PageRank算法和TF-IDF算法中适用于恐怖信息采集的优点,并结合回归分析法,将恐怖信息的采集策略进行相关度预测,用预测结果反馈调节信息的采集过程、[结果/结论]网络恐怖信息采集要兼顾采集的数量和质量,在传统主题爬虫算法的基础上进行改进,提出针对于开源网络恐怖信息采集的爬虫优化算法,可以提高信息采集效率。 [ Purpose/significance] Aiming at the problems that getting the terrorist information on the network is dif- ficult and the acquisition efficiency is low from the open source network information, a method based on the regression a- nalysis is proposed to improve the acquisition efficiency of the network terror information by combining the advantages of the semantic relevance and the web page importance. [ Method/process] By analyzing and comparing the characteristics of the theme crawler and combining them with the characteristics of the network terrorist information, the advantages of the PageRank algorithm and the IF-IDF algorithm for the collection of the terrorist information were found out. Combined with the regression analysis, the relevance prediction of the terrorist information was done, which reflected the process of the information collection. [ Result/conclusion ] Both the quantity and quality of the collection of the network terrorist informa- tion should be taken into consideration. Based on the traditional common network crawler algorithm, this paper proposes a crawler optimization algorithm pertinent to the network terrorist information collection, which improves the collection effi- ciency.
出处 《图书情报工作》 CSSCI 北大核心 2018年第4期121-129,共9页 Library and Information Service
基金 国家自然科学基金项目“微博环境下实时主动感知网络舆情事件的多核方法研究”(项目编号:71303075)和“大数据环境下基于特征本体学习的无监督文本分类方法研究”(项目编号:71571064)研究成果之一
关键词 主题爬虫 回归分析 网络反恐 语义相似度 theme crawler regression analysis network anti-terrorism semantic similarity
  • 相关文献

参考文献22

二级参考文献292

共引文献347

同被引文献39

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部