摘要
构建了一个基于最大熵原理的汉语人名地名自动识别混合模型.该模型分为训练和识别两个模块.先从训练语料中抽取特征,利用最大熵方法对特征进行训练.然后使用经过训练的特征,并结合动态词表和少量规则,对测试文本中的汉语人名地名进行识别.达到了比较满意的识别效果.最后对实验结果进行了分析.
This paper constructs a hybrid model for Chinese person name and location name automatic recognition, which is based on the maximum entropy principle. The model consists of a training module and a recognizing module. At first, features are extracted from the training corpus. The maximum entropy principle is employed to train the features. Then the trained features together with a dynamic-word-list and a simple rule-base are used to recognize Chinese person names and location names in the testing corpus. The experimental results are satisfying and have been analyzed at the end of this paper.
出处
《小型微型计算机系统》
CSCD
北大核心
2006年第9期1761-1765,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60203010)资助.
关键词
最大熵模型
专有名词识别
特征提取
语言学规则
maximum entropy (ME) model
named entity recognition (NER)
feature extraction
linguistic rules