摘要
最近,将词嵌入融合到基于字嵌入的模型已经被证明是有效的。但是,大多数现有的研究都忽视了对部首信息的使用。该文提出一种将词嵌入、字嵌入和部首嵌入相融合的新方法,该方法利用词嵌入和字嵌入的优点,并考虑了部首中包含的大量语义信息,从而充分利用了不同粒度的语义信息,提高了模型的识别效果。该方法在Weibo数据集和MSRA数据集上的实验结果表明,该文所提方法较相关方法在识别精度上取得提升。
Recently,integrating word information into character-based model has been proved to be effective for Chi-nese named entity recognition.To capture Chinese radicals with a large amount of semantic information,we propose a new model integrating word embedding and radical embedding into character-based model,so as to leverage se-mantics of different granularities information and improve the recognition accuracy.The experimental results on the Weibo dataset and MSRA dataset show that the proposed model has improved recognition accuracy compared with relatedmodels.
作者
尹成龙
陈爱国
YIN Chenglong;CHEN Aiguo(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第4期63-71,共9页
Journal of Chinese Information Processing
关键词
命名实体识别
多重嵌入
注意力机制
部首信息
named entity recognition
multiple embeddings
attention mechanism
radical information