摘要
在面向特定领域的分词中,术语抽取效果的好坏会对分词精度产生很大影响。因此,高精度的术语抽取成为领域分词的基础工作。针对特定领域提出了一种统计和规则相结合的术语抽取方法。在条件随机场给出的5-best结果的基础上,通过规则及给分机制进行术语抽取,并对抽取结果利用规则进行后处理。实验表明,相比于传统的基于条件随机场1-best进行的术语抽取,该方法能够明显提高未登录术语的召回率。
The extraction of terms has a significant impact on the precision of domain-specific word segmen- tation. Based on the combination of statistics and rules, this paper proposes a method of term extraction for a certain specific. The 5-best results are achieved with Conditional Random Fields first, then the term extrac- tion is performed with rules and scoring mechanism, finally the extracted data are post-processed with rules. Compared to the term extraction of 1-best output based on Conditional Random Fields, this method can ef- fectively improve the recall rate of the out-of-vocabulary terms.
出处
《沈阳航空航天大学学报》
2011年第5期71-74,共4页
Journal of Shenyang Aerospace University