摘要
在英文中,每个单词在文档中的全局特征可以有效提升实体识别度。与英文不同,中文没有明确的分隔符,模型学习的基本单元是字符而非词汇。因此,引入字符的全局特征增加了模型学习的难度。为了解决这一问题,在模型提取每个字符的上下文表示后,首先获取每个字符在文档中的不同上下文表示,然后对不同的上下文表示进行多重过滤,最后通过门控注意力机制控制全局特征的预测权重。实验结果表明,提出的模型在Resume、Weibo和Ontonotes4.0数据集上相比基准模型更具竞争力。
In English,global features of each word in a document can effectively enhance entity recognition.Unlike English,Chinese does not have explicit delimiters,and the basic unit of learning for models is characters rather than words.Therefore,introducing global features for characters increases the difficulty of model learning.To address this issue,after the model extracts contextual representations for each character,it first obtains different contextual representations for each character within the document.Then,multiple filters are applied to these different contextual representations.Finally,a gated attention mechanism controls the prediction weight of the global features.Experimental results show that the proposed model outperforms baseline models on the Resume,Weibo,and Ontonotes 4.0 datasets.
作者
常君
刘金花
刘峰
Chang Jun;Liu Jinhua;Liu Feng(Fenyang College of Shanxi Medical University,Fenyang 032200,China)
出处
《现代计算机》
2024年第23期97-102,共6页
Modern Computer
基金
山西高等学校教学改革创新项目(J20231658、J20241694)。
关键词
中文命名实体识别
全局特征
过滤机制
门控注意力
Chinese named entity recognition
global features
filtering mechanism
gating attention