摘要
针对传统专利情报采集的方式不能适应专利信息快速增加的问题,通过研究适用于专利信息聚类的主题模型和聚类算法,提出了将潜在狄利克雷分配(LDA)主题模型和OPTICS算法相结合的解决方案。该方案采用LDA主题模型将专利信息在词汇空间的高维表达转换到在主题空间的低维表达,高效地实现了对专利信息的降维,进而采用OPTICS算法及k近邻准则对专利信息进行聚类分析,达到收集感兴趣的专利情报信息的目的。理论分析和实验验证表明,提出的解决方案不仅能通过降维,提高专利聚类效率,而且能对专利信息分析提供帮助。
To solve the problem that the traditional way of collecting patent intelligence can not adapt to the rapid increase of patent information, by researching the problem of the classification of patent information with the consideration of the characteristics of patent information, a solution that combined Latent Dirichlet Allocation (LDA) topic model and Ordering Points to Identify the Clustering Structure (OPTICS) algorithm was proposed. This solution adopted LDA topic model to realize dimension reduction for patent information efficiently, through transforming high dimensional expression of patent information in lexical space to low dimensional expression in topic space, and used OPTICS algorithm and k-nearest neighbor to implement clustering analysis of patent information, in order to collect interesting patent intelligence. Theoretical analysis and experimental verification indicate that the solution can improve the efficiency of patent clustering via dimension reduction and contribute to the analysis of patent information.
出处
《计算机应用》
CSCD
北大核心
2013年第A01期87-89,93,共4页
journal of Computer Applications
关键词
潜在狄利克雷分配主题模型
聚类分析
OPTICS算法
专利信息聚类
专利分析
Latent Dirichlet Allocation(LDA) topic model
clustering analysis
OPTICS algorithm
patent information clustering
patent analysis