期刊文献+

基于最大熵的汉语人名地名识别方法研究 被引量:26

Research on Chinese Person Name and Location Name Recognition Based on Maximum Entropy Model
在线阅读 下载PDF
导出
摘要 构建了一个基于最大熵原理的汉语人名地名自动识别混合模型.该模型分为训练和识别两个模块.先从训练语料中抽取特征,利用最大熵方法对特征进行训练.然后使用经过训练的特征,并结合动态词表和少量规则,对测试文本中的汉语人名地名进行识别.达到了比较满意的识别效果.最后对实验结果进行了分析. This paper constructs a hybrid model for Chinese person name and location name automatic recognition, which is based on the maximum entropy principle. The model consists of a training module and a recognizing module. At first, features are extracted from the training corpus. The maximum entropy principle is employed to train the features. Then the trained features together with a dynamic-word-list and a simple rule-base are used to recognize Chinese person names and location names in the testing corpus. The experimental results are satisfying and have been analyzed at the end of this paper.
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第9期1761-1765,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60203010)资助.
关键词 最大熵模型 专有名词识别 特征提取 语言学规则 maximum entropy (ME) model named entity recognition (NER) feature extraction linguistic rules
  • 相关文献

参考文献4

二级参考文献45

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:88
  • 2沈达阳 孙茂松 黄昌宁.中文地名的自动识别[A]..计算语言学进展与应用[C].北京:清华大学出版社,1995..
  • 3D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15.
  • 4Y. Yang, X. Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf. onResearch and Development in the Information Retrieval. NewYork: ACM Press, 1999.
  • 5Y. Yang, C. G. Chute. An example based mapping method for text categorization and retrieval. ACM Trans. on Information Systems, 1994, 12(3): 252 -277.
  • 6E. Wiener. A neural network approach to topic spotting. The 4th Annual Syrup. on Document Analysis and Information Retrieval,Las Vegas, NV, 1995.
  • 7R. E. Schapire, Y. Singer. Improved boosting algorithms using confidence-rated predications. In: Proc. of the 11th Annual Conf.on Computational Learning Theory. New York: ACM Press,1998. 80--91.
  • 8T. Joachims. Text categorization with support vector machines:Learning with many relevant features. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998. 137-142.
  • 9Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1999, 1 ( 1 ) : 76-- 88.
  • 10R. Adwait. Maximum entropy models for natural language ambiguity resolution: [ Ph. D. dissertation ] . Pennsylvania:University of Pennsylvania, 1998.

共引文献202

同被引文献300

引证文献26

二级引证文献290

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部