摘要
针对传统词义消歧仅基于上下文语境而导致准确率低的问题,提出一种多策略的无监督自动词义消歧方法。利用从维基百科在线中提炼出的丰富语义知识,线性融合上下文语境、背景知识和语义信息3大特征,根据逻辑回归算法学习各特征的权重,选取最大融合值所对应的候选项作为最优词义。在SENSEVAL数据集上取得了85.50%的平均准确率,验证了该方法的有效性。
Most traditional Word Sense Disambiguation(WSD) method is just based on contextual information, often results in inaccurate output. A multi-level unsupervised automatic WSD method which works efficiently is promoted. This method utilizes the rich semantic information extracted from online Wikipedia, makes a linear fusion of contextual information, background knowledge and semantic information, uses logistic regression algorithm to learn the weight of each feature, and selects the one with the maximum combined value as correct meaning. Experimental result on SENSEVAL dataset shows an average precision of 85.50%, therefore validates the feasibility and effectiveness of this method.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第18期62-64,66,共4页
Computer Engineering
基金
国家"863"计划基金资助项目(2007AA01Z137)
关键词
词义消歧
维基百科
知识库
无监督学习
Word Sense Disambiguation(WSD)
Wikipedia
knowledge base
unsupervised learning