摘要
随着通信事业的快速发展,短信文本信息量非常巨大,乃至亿级,同时大类别短信文本中隐含着热点事件。现有聚类算法对海量短信文本进行聚类分析显得力不从心。利用短信文本在给定时间段中的内聚性,对待聚类的短信文本进行排序,并在聚类过程中清除孤立信息和小类别短信文本。实验表明,对于海量短信文本的大类别聚类效率是非常高的。
With the rapid development of telecommunication industry, SMS text such as query logs and SMS text messages play an in- creasingly important role in our dally life, and there are hidden hot events in large size class of Chinese SMS text. Most existing clustering methods are hard to be applied in dealing with this kind of information due to the huge scale of data. Using SMS text cohesion in a given time period, the clustering of SMS text is sorted and isolated information and small set SMS text are removed in the clustering process. The experiments show that the clustering efficiency of the large size class for mass SMS text is very high.
出处
《情报杂志》
CSSCI
北大核心
2013年第2期30-33,共4页
Journal of Intelligence
基金
河北省科技支撑计划项目“垃圾信息的预意识别”(编号:10213581)
淮阴工学院重点基金项目(编号:HGA0907)资助
关键词
大类别
短信文本聚类方法热点事件
large size class SMS text clustering method hot events