期刊文献+

基于BLSTM的科技文献术语抽取方法 被引量:9

Scientific Literature Terms Extraction Based on Bidirectional Long Short-Term Memory Model
在线阅读 下载PDF
导出
摘要 术语抽取是研究科技文献领域的重要技术,为进一步提高科技文献术语抽取的准确率和召回率,本文采用了基于BLSTM(Bidirectional Long Short-Term Memory)的神经网络模型。使用预先训练的词向量字典将中文分词结果映射为向量作为BLSTM模型的输入,使用序列标注的方法将输出分类结果映射为术语的边界进行术语抽取。在自动化技术、计算机技术领域的数据集上,设计实验对比了使用词为特征的BLSTM模型和条件随机场模型的抽取结果。结果表明基于BLSTM模型的科技文献术语抽取得了更优的性能,在中文数据集上精确率最高0.7821,召回率最高0.8020,F1值最高0.7860,在英文数据集上分别达到0.8525,0.8677和0.8543。 Term extraction plays an important role in the field of scientific literature.In order to improve the accuracy and recall of the term extraction,this research designed a neural network model based on BLSTM(Bidirectional Long Short-Term Memory)model.The segmentation results in Chinese were mapped into the vectors via pre-trained word vector dictionary,and the output of classification results were mapped as the term boundaries via the sequence tagging.The experiment was implemented to compare the BLSTM model with word feature and the conditional random field method in the fields of automation technology and computer technology.The results presented that the BLSTM model obtained the better performance with the highest accuracy 0.7821,the highest recall 0.8020 and the highest F1 value 0.7860 in Chinese dataset.For the English dataset,the highest accuracy,recall and F1 value is 0.8525,0.8677 and 0.8543,respectively.
作者 赵东玥 杜永萍 石崇德 ZHAO Dongyue;DU Yongping;SHI Chongde(Faculty of Information Science,Beijing University of Technology,Beijing 100124,China;Institute of Scientific and Technical Information of China,Beijing 100038,China)
出处 《情报工程》 2018年第1期67-74,共8页 Technology Intelligence Engineering
基金 国家自然科学基金青年基金项目"面向科技监测的实体识别与关系抽取研究"(71403257)
关键词 术语抽取 科技文献 长短时记忆 Term extraction scientific literature LSTM
  • 相关文献

参考文献17

二级参考文献212

共引文献327

同被引文献131

引证文献9

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部