期刊文献+

基于条件随机场(CRFs)的中文词性标注方法 被引量:56

A Chinese Part-of-speech Tagging Approach Using Conditional Random Fields
在线阅读 下载PDF
导出
摘要 本文提出一种基于CRFs模型的中文词性标注方法。该方法利用CRFs模型能够添加任意特征的优点,在使用词的上下文信息的同时,针对兼类词和未登录词添加了新的统计特征。在《人民日报》1月份语料库上进行的封闭测试和开放测试中,该方法的标注准确率分别为98.56%和96.60%。 This paper presents a new approach to part-of-speech (POS) tagging for Chinese texts using conditional random fields (CRFs). To take advantage of the ability of using arbitrary features as input in CRFs, not only contexts of words are exploited, but also are new statistical features adopted for multiple-category and out-of-vocabulary words. Closed and open tests conducted on People Daily dataset obtain POS tagging accuracies of 98. 56% and 96.60%, respectively.
出处 《计算机科学》 CSCD 北大核心 2006年第10期148-151,155,共5页 Computer Science
关键词 词性标注 条件随机场 维特比解码 Part-of-speech tagging, Conditional random fields (CRFs), Viterbi decoding
  • 相关文献

参考文献7

  • 1Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data.In:Proceedings of the 18th International Conf on machine Learning,2001.282~289
  • 2周明,吴进,黄昌宁.用于词性标注的一种快速学习算法──对Brill的基于变换算法的一项改进[J].计算机学报,1998,21(4):357-366. 被引量:8
  • 3Sha F,Pereira F.Shallow Parsing with Conditional Random Fields.In:Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL),2003
  • 4现代汉语语料库加工规范-词语切分与词性标注.北京大学计算语言学研究所,1999
  • 5白栓虎.基于统计的汉语词性自动标注方法[J].语文建设,1994(10):38-40. 被引量:2
  • 6Bai Shuanhu.An Integrated Model of Chinese Word Segmentation and Part-of Speech Tagging.In:Advanced and Applications on Computational Linguistics,Third National Computational Linguistics Meeting,Shanghai.Nov.1995.56~61
  • 7Bai S H,Xia,Y,Huang C N.Automatic Part-of-Speech Tagging System of Chinese:[Technical Report].Beijing:Tsinghua University,1992

共引文献8

同被引文献428

引证文献56

二级引证文献300

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部