摘要
本文提出一种基于CRFs模型的中文词性标注方法。该方法利用CRFs模型能够添加任意特征的优点,在使用词的上下文信息的同时,针对兼类词和未登录词添加了新的统计特征。在《人民日报》1月份语料库上进行的封闭测试和开放测试中,该方法的标注准确率分别为98.56%和96.60%。
This paper presents a new approach to part-of-speech (POS) tagging for Chinese texts using conditional random fields (CRFs). To take advantage of the ability of using arbitrary features as input in CRFs, not only contexts of words are exploited, but also are new statistical features adopted for multiple-category and out-of-vocabulary words. Closed and open tests conducted on People Daily dataset obtain POS tagging accuracies of 98. 56% and 96.60%, respectively.
出处
《计算机科学》
CSCD
北大核心
2006年第10期148-151,155,共5页
Computer Science
关键词
词性标注
条件随机场
维特比解码
Part-of-speech tagging, Conditional random fields (CRFs), Viterbi decoding