摘要
中文姓名识别是自然语言处理中专名识别的一个重要的子问题 ,本文将中文姓名的识别过程细分为三个步骤 :抽取阶段、分类阶段和消歧阶段。利用中文姓和名的用字概率信息 ,在文本中抽取潜在的中文姓名 ,以及其相关的上下文词法、语法和语义特征 ,并将潜在姓名是否是真实姓名的判别看作是两分类问题 ,并利用决策树算法来实现初步判别 ,最后消除初步判别结果中的歧义现象。实验结果表明 ,该方法的召回率和准确率都可达到 90 %以上。
Chinese person name identification is a subfield of Named Entity Identification in natural language processing. This identification is divided into three stages in this paper: extraction, classification, and disambiguation. The candidate Chinese person names are extracted using statistical information. The morphological, syntax, and semantic features of the context are also extracted to compose the sample of classification. The estimation of the candidate is deemed to classification. We classify every candidate using decision tree to distinguish whether it is a real Chinese person name. In the end, the inconsistency in classification is disambiguated. Recall and precision are all above 90% in experiments using this method.
出处
《中文信息学报》
CSCD
北大核心
2004年第6期10-15,共6页
Journal of Chinese Information Processing
基金
自然科学基金资助项目 (6 0 4 96 32 6 )
日本富士施乐公司资助项目
关键词
人工智能
自然语言处理
中文姓名识别
决策树
自然语言处理
artificial intelligence
natural language processing
Chinese person name identification
decision tree
natural language processing