摘要
微博的流行导致信息过载等问题日益突出,如何帮助用户快速而准确地找到需要的微博已成为亟待解决的问题。基于协同过滤技术和基于LDA的微博推荐虽然能够达到一定的准确性,但并不能解决内容分类过于笼统及使用LDA模型处理短文本存在弊端的问题。为此,文中提出了一种融合内容相似度与多特征计算的个性化微博推荐模型。首先,从微博内容语义出发,基于word2vec技术计算得到用户与微博的内容相似度;然后,根据微博的时间、点赞数、评论数和转发数等特征,计算微博的保鲜度及受欢迎度;最后,综合考虑微博的内容相似度、保鲜度和受欢迎度,计算微博排序评分,从而实现用户的个性化微博推荐。该模型根据内容相似度进行推荐,从而避免了上述问题,也使得推荐结果在语义上更为精确。实验结果表明,所提推荐模型在准确率、召回率和F值上均具有良好的表现,尤其在准确率方面有明显的提升效果,约提升了10%,F值也提升了约5%,从而证明了该模型的有效性。
With the popularity of microblog,problems such as information overload are increasingly prominent.How to help users find the microblog they need quickly and accurately has become an urgent problem to be solved.Although microblog recommendation based on collaborative filtering technology and LDA can achieve certain accuracy,it can not solve the problems of genernal classification of content and the disadvantages when LDA model is used to deal with short texts.Therefore,this paper proposes a personalized microblog recommendation model integrating content similarity and multi-feature computing.Firstly,the content similarity between user and microblog is calculated based on word2vec.Then,according to the characteristics such as time,number of likes,comments and reposts,the freshness and popularity of microblog are calculated.Finally,the content similarity,freshness and popularity of microblog are comprehensively considered to calculate its ranking score,so as to realize users’personalized microblog recommendation.This model considers recommendation from the perspective of content similarity,avoiding the above problems and making the recommendation results more accurate in semantics.Experimental results show that the proposed model has good performance in accuracy,recall rate and F-measure,in particular,the accuracy has been significantly improved by about 10%,and F-Measure is increased by about 5%,and the validity of the model is proved.
作者
刘宇东
孙豪
蒋运承
LIU Yu-dong;SUN Hao;JIANG Yun-cheng(School of Computer Science,South China Normal University,Guangzhou 510631,China)
出处
《计算机科学》
CSCD
北大核心
2020年第10期97-101,共5页
Computer Science
基金
国家自然科学基金面上项目(61772210)
广州市科技计划项目(201807010043)。