期刊文献+

基于LDA模型的专利信息聚类技术 被引量:22

Patent information clustering technique based on latent Dirichlet allocation model
在线阅读 下载PDF
导出
摘要 针对传统专利情报采集的方式不能适应专利信息快速增加的问题,通过研究适用于专利信息聚类的主题模型和聚类算法,提出了将潜在狄利克雷分配(LDA)主题模型和OPTICS算法相结合的解决方案。该方案采用LDA主题模型将专利信息在词汇空间的高维表达转换到在主题空间的低维表达,高效地实现了对专利信息的降维,进而采用OPTICS算法及k近邻准则对专利信息进行聚类分析,达到收集感兴趣的专利情报信息的目的。理论分析和实验验证表明,提出的解决方案不仅能通过降维,提高专利聚类效率,而且能对专利信息分析提供帮助。 To solve the problem that the traditional way of collecting patent intelligence can not adapt to the rapid increase of patent information, by researching the problem of the classification of patent information with the consideration of the characteristics of patent information, a solution that combined Latent Dirichlet Allocation (LDA) topic model and Ordering Points to Identify the Clustering Structure (OPTICS) algorithm was proposed. This solution adopted LDA topic model to realize dimension reduction for patent information efficiently, through transforming high dimensional expression of patent information in lexical space to low dimensional expression in topic space, and used OPTICS algorithm and k-nearest neighbor to implement clustering analysis of patent information, in order to collect interesting patent intelligence. Theoretical analysis and experimental verification indicate that the solution can improve the efficiency of patent clustering via dimension reduction and contribute to the analysis of patent information.
出处 《计算机应用》 CSCD 北大核心 2013年第A01期87-89,93,共4页 journal of Computer Applications
关键词 潜在狄利克雷分配主题模型 聚类分析 OPTICS算法 专利信息聚类 专利分析 Latent Dirichlet Allocation(LDA) topic model clustering analysis OPTICS algorithm patent information clustering patent analysis
  • 相关文献

参考文献9

二级参考文献29

  • 1陈卫明.德温特分析软件[J].专利文献研究,2004(3):7-11. 被引量:3
  • 2凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 3[加]HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 4陈燕.专利信息分析与采集[M].北京:清华大学出版社,2006:59-60.
  • 5VASSILIADIS P, SIMITSIS A. Conceptual modeling for ETL processes[C]//Proceedings of the Fifth ACM International Workshop on Data Warehousing and OLAP, 2002.
  • 6Hyvarinen A. Fast and robust fixed-point algorithms for independent component analysis[J]. IEEE Transactions on Neural Networks, 1999,10(3) :626 - 634.
  • 7Jiawei Han Micheline Kamber.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
  • 8Lanjouw J O; Schankerman M.Patent quality and research productivity:measuring innovation with multiple indicators.Economic Journal,2004,114(495).
  • 9Narin,F.Patents as indicators for the evaluation of industrial research output.Scientometrics,1995(3).
  • 10OECD patent indicators.www.oecd.org/dataoecd/22/52/33776061.pdf(2006-08-12).

共引文献162

同被引文献240

引证文献22

二级引证文献119

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部