摘要
针对文本图像文种识别中特征提取速度和识别精度之间的矛盾,提出了一种基于文字笔画方向直方图的文种识别方法,利用笔画方向直方图对不同文种文字的笔画方向分布差异进行描述并提取特征,采用支持向量机对所提特征进行训练和分类,实现文字种类识别。在实验中选用有质量退化的中、英、俄、日、韩、阿拉伯等10种不同语言文字文本图像。实验结果表明,本方法运算速度快,有较高的识别准确率并对图像质量退化有较好鲁棒性。
Considering the contradiction between the speed of feature extraction and accuracy of identification results in script identification of document image,this paper proposes a new script identification algorithm based on the difference of the stroke direction distribution,and defines the stroke direction histogram,which describes the distribution of the stroke direction effectively.The Support Vector Machine(SVM) is applied for training and classifying the features extracted based on the stroke direction histogram to identify scripts in different languages.Experiments have been performed upon degraded document images,which include ten kinds of languages(Chinese,Russian,English,Japanese,Korean,Arabic,etc).Experimental results confirm that the proposed algorithm can identify scripts accurately and efficiently,and is robust to degraded images.
出处
《信息工程大学学报》
2011年第2期231-237,共7页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(60970172)
关键词
文本图像
文种识别
笔画方向直方图
支持向量机
document image
script identification
stroke direction histogram
support vector machine