摘要
联机手写汉字识别已从单字识别走向多字,乃至整篇文档的识别,字符的正确切分很大程度上决定了系统的性能。为了提高系统对整页文档的切分识别性能,该文提出了一种联机手写汉字的切分算法。通过基于规则的笔划合并将单个笔划合并成字符块,在此基础上,采用动态规划算法,利用字符块的长宽比、大小、间距以及识别信息,寻找出最优的切分路径。同时,对于手写汉字的任意性,提出了一种对汉字外接矩形框进行裁减的调整算法,使外接框并不包含整个汉字。实验证明,该文所提出的算法对于字数较多的多行样本以及存在字符相互粘连的情况,都有着很好的效果。
The recognition of on-line handwritten Chinese characters has developed from a single character process to a multi-character process, and the system performance depends greatly on the segmentation result. The system performance is improved by an algorithm for on-line handwritten Chinese character segmentation. The algorithm uses geometric information to merge the strokes into character blocks based on handwriting rules and applies dynamic programming to find the best segmentation path. The width-height ratio, size, inter-character distance, and recognition information are used to construct the cost matrix. Handwriting variations are handled by a plastic algorithm that reduces the bounding box around thecharacter to exclude some parts of the character. Experiments show that the algorithm is very effective for on-line handwritten Chinese character segmentation with either numerous characters on multiple-lines or characters touchingeach other.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第10期1417-1421,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家"八六三"高技术项目(2001AA114081)
国家自然科学基金资助项目(60241005)
关键词
文字识别
动态规划
手写汉字切分
边框修整
character recognition
dynamic programming
Chinese handwritten character segmentation
plastic algorithm