期刊文献+

基于短语及依存的标注规则和短语识别算法研究

Research on Annotation Rules and Phrase Recognition Algorithm Based on Phrase and Dependency
在线阅读 下载PDF
导出
摘要 目前,自然语言处理大多是借助于分词结果进行句法依存分析,主要采用基于监督学习的端对端模型。该方法主要存在两个问题,一是标注体系繁多,相对比较复杂;二是无法识别语言嵌套结构。为了解决以上问题,该文提出了基于短语窗口的依存句法标注规则,并标注了中文短语窗口数据集(CPWD),同时引入短语窗口模型。该标注规则以短语为最小单位,把句子划分为7类可嵌套的短语类型,同时标示出短语间的句法依存关系;短语窗口模型借鉴了计算机视觉领域目标检测的思想,检测短语的起始位置和结束位置,实现了对嵌套短语及句法依存关系的同步识别。实验结果表明,在CPWD数据集上,短语窗口模型比传统端对端模型F1值提升超过1个百分点。相应的方法应用到了CCL2018的中文隐喻情感分析比赛中,在原有基础上F1值提升了1个百分点以上,取得第一名成绩。 At present,most syntactic dependency analysis is conducted via supervised learning with the help of word segmentation results.This practice is challenged by complex label schemes and the nesting structure which is difficult to parse.This paper proposes a phrase window model together with a dependency syntax labeling rule based on the phrase window.The labeling rule divides sentences into 7 types of nestable phrases,with annotation for the syntactic dependence between phrases.Inspired by the idea of target detection in the computer vision field,the phrase window model detects the beginning and end positions of phrases and realizes the synchronous recognition of nested phrases and syntactic dependencies.Experimental results show that on the selfbuilt Chinese Phrase Window Dataset(CPWD),the phrase window model is more than 1 point better than the traditional endtoend model.The corresponding method won the champion in the CCL2018 Chinese Metaphor Sentiment Analysis Competition,which improved more than 1 point than the baseline.
作者 刘广 涂刚 李政 刘译键 LIU Guang;TU Gang;LI Zheng;LIU Yijian(School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan,Hubei 430074,China)
出处 《中文信息学报》 CSCD 北大核心 2024年第2期15-24,共10页 Journal of Chinese Information Processing
关键词 自然语言处理 标注体系 短语识别 依存分析 natural language processing tagging system phrase extraction dependency parsing
  • 相关文献

参考文献10

二级参考文献106

共引文献123

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部