摘要
以新浪微博平台为研究对象,利用Python语言和Web自动化工具通过平台提供的应用接口实现自动认证和微博数据的自动抓取,将其转换成需要的数据格式之后,运用深度优先搜索算法进行分析,获得用户的关系并可视化。此外,采用改进的K-means算法进行主题聚类,实验结果表明,改进后的算法更加准确有效。最后,根据用户信息生成兴趣相关性矩阵,采用改进后的K-means算法分析微博用户关注兴趣的相似性。
Taking Sina Weibo platform as the research object, we can automatically obtain the authorization and grab the microblog data from the application interface by using Python language and Web automation tools. Then, the data are converted into the required format. The depth first search algorithm is used to analyze the user relationship, and the relationship is visualized. In addition, the improved K- means algorithm is proposed for topic clustering analysis. Experimental results demonstrate that the proposed method is more accurate and effective. Finally, an interest correlation matrix is generated based on the user information; the improved K-means algorithm is used to an- alyze the similarity of attention behavior between Microblog users.
出处
《情报杂志》
CSSCI
北大核心
2014年第6期144-148,共5页
Journal of Intelligence
基金
湖北省教育厅科学技术研究计划指导性项目"基于LP的社交网络用户关系挖掘平台"(编号:B2013258)
关键词
新浪微博
用户关系
数据挖掘
聚类分析
Sina Weibo user relationship data mining clustering analysis