摘要
在科学文献管理中,存在大量的科学技术文献需要被高效的识别、分类和保存.对于研究者来说,在研究该领域的相关知识时,通常会检索该领域的相关专家的文章,然而,姓名作为检索的常见搜索经常会出现歧义问题,这导致文献检索、统计和分析的质量下降.现有的方法在数据集上仍不能表现出良好的聚类效果,如何实现有效的消歧方法仍是一项挑战.本文提出基于图卷积神经网络的作者姓名消歧技术.首先使用BERT模型将文献作者、出版机构、摘要等多种属性信息嵌入到低维向量空间中,得到与作者相关的多种属性的嵌入向量,克服嵌入向量不够准确的缺陷;接下来以节点嵌入为基础,为每篇文献都构建文献局部图,使用图卷积神经网络对生成的文献局部图进行链路预测,有助于提高链路预测的准确性;最后,在图上使用简单的连通域搜索并动态剪枝进行聚类.基于实验表明,本文提出的方法有比较好的性能提升,能够提高作者姓名消歧的准确性.
In the management of scientific documents,there are a large number of scientific and technological documents that need to be efficiently identified,classified and preserved.For researchers,when studying the relevant knowledge in this field,they usually search the articles of relevant experts in this field.However,as a common search for retrieval,names often lead to ambiguity,which leads to a decline in the quality of literature retrieval,statistics and analysis.Existing methods still cannot showgood clustering effect on data sets,and howto implement effective disambiguation methods is still a challenge.In this paper,author name disambiguation technology based on graph convolution neural network is proposed.Firstly,BERT model is used to embed various attribute information such as document author,publishing organization,abstract,etc.into low-dimensional vector space to obtain embedding vectors of various attributes related to the author,thus overcoming the defect of inaccurate embedding vectors.Next,based on node embedding,a local document map is constructed for each document,and a graph convolution neural network is used to predict the link of the generated local document map,which is helpful to improve the accuracy of link prediction.Finally,simple connected domain search and dynamic pruning are used to cluster on the graph.Based on experiments,the proposed method has better performance and can improve the accuracy of author name disambiguation.
作者
施浓
聂铁铮
申德荣
寇月
于戈
SHI Nong;NIE Tie-zheng;SHEN De-rong;KOU Yue;YU Ge(School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第10期2217-2222,共6页
Journal of Chinese Computer Systems
基金
中央高校基础科研业务费项目(N180716010)资助
国家自然科学基金项目。
关键词
作者消歧
图卷积神经网络
节点嵌入
链路预测
命名实体
author disambiguation
graph convolutional neural network
node embedding
link prediction
name dentity recognition