摘要
中文分词是自然语言处理的基础任务。随着文本数据量的增长,对中文分词进行研究具有十分重要的意义。jieba分词是较为常用的中文分词技术,分词的准确率较高,面向jieba分词技术研究加快分词速度的方法,该方法采用Cython实现分词技术的核心算法,对中文文本进行分词处理。在ICC中文数据集上进行实验,实验结果表明,该分词加速方法能够提高63.9%的分词速度。
Chinese word segmentation is the basic task of natural language processing.With the growth of text data,it is of great significance to study Chinese word segmentation.Jieba word segmentation is a commonly used Chinese word segmentation technology,which has a high accuracy rate.This paper studies the method to speed up word segmentation for Jieba word segmentation technology,which uses the core algorithm of the word segmentation technology of Python to segment Chinese text.Experiments on ICC Chinese data set show that the method can improve the speed of wor d segmentation by 63.9%.
作者
韦人予
Wei Renyu(College of Computer and Electronics Information,Guangxi University,Nanning Guangxi 530004,China)
出处
《信息与电脑》
2020年第10期26-29,共4页
Information & Computer
关键词
中文分词
自然语言处理
jieba分词
chinese word segmentation
natural language processing
jieba segmentation