摘要
目前,自然语言处理大多是借助于分词结果进行句法依存分析,主要采用基于监督学习的端对端模型。该方法主要存在两个问题,一是标注体系繁多,相对比较复杂;二是无法识别语言嵌套结构。为了解决以上问题,该文提出了基于短语窗口的依存句法标注规则,并标注了中文短语窗口数据集(CPWD),同时引入短语窗口模型。该标注规则以短语为最小单位,把句子划分为7类可嵌套的短语类型,同时标示出短语间的句法依存关系;短语窗口模型借鉴了计算机视觉领域目标检测的思想,检测短语的起始位置和结束位置,实现了对嵌套短语及句法依存关系的同步识别。实验结果表明,在CPWD数据集上,短语窗口模型比传统端对端模型F1值提升超过1个百分点。相应的方法应用到了CCL2018的中文隐喻情感分析比赛中,在原有基础上F1值提升了1个百分点以上,取得第一名成绩。
At present,most syntactic dependency analysis is conducted via supervised learning with the help of word segmentation results.This practice is challenged by complex label schemes and the nesting structure which is difficult to parse.This paper proposes a phrase window model together with a dependency syntax labeling rule based on the phrase window.The labeling rule divides sentences into 7 types of nestable phrases,with annotation for the syntactic dependence between phrases.Inspired by the idea of target detection in the computer vision field,the phrase window model detects the beginning and end positions of phrases and realizes the synchronous recognition of nested phrases and syntactic dependencies.Experimental results show that on the selfbuilt Chinese Phrase Window Dataset(CPWD),the phrase window model is more than 1 point better than the traditional endtoend model.The corresponding method won the champion in the CCL2018 Chinese Metaphor Sentiment Analysis Competition,which improved more than 1 point than the baseline.
作者
刘广
涂刚
李政
刘译键
LIU Guang;TU Gang;LI Zheng;LIU Yijian(School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan,Hubei 430074,China)
出处
《中文信息学报》
CSCD
北大核心
2024年第2期15-24,共10页
Journal of Chinese Information Processing
关键词
自然语言处理
标注体系
短语识别
依存分析
natural language processing
tagging system
phrase extraction
dependency parsing