摘要
光学字符识别(Optical Character Recognition,OCR)是对文本图片进行扫描,然后对图像进行分析处理,获取到其中的文字内容的过程。但是目前的OCR算法对于弯曲的长文本普遍识别效果不佳,为此,提出了一种面向识别的长弯曲文本预处理算法,即在文本行识别之前添加长弯曲文本处理模块(Long Curve Text Processing,LCTP),以提升图像中所有文本行识别的准确率。首先,在进行文本区域检测后,获取单条长弯曲文本行并清除干扰信息;其次,根据单条长弯曲文本行的特征计算每条弯曲文本行的关键拐点;进而,使用关键拐点对单条文本行进行切分和融合;最后,将经过切分与融合后的文本行输入文本行识别模型中得到最终识别结果。通过手动采集长弯曲文本图像形成的数据集Long Curve Text与目前主流OCR框架PP-OCR和Tesseract OCR进行对比实验可知,LA、MED、NED指标均有提升,相比于PP-OCR,LA提升49.5%,MED和NED分别降低了44115和0.182;相比于Tesseract OCR,LA提升3.2%,MED和NED分别降低了30282和0.125。同时,也在Long Curve Text数据集中进行了消融实验以验证本文提出LCTP的有效性以及进行了LCTP各个结构的时间对比实验以验证本文提出LCTP的高效性。结果表明LCTP可以提高长弯曲文本识别准确率,总体上可以地获得更加准确、有效的识别结果。
Optical Character Recognition(OCR)is the process of scanning text images,analyzing and processing the images to extract the textual content.However,current OCR algorithms generally have poor performance in recognizing long and curved texts.To address this issue,a pre-processing algorithm called Long Curve Text Processing(LCTP)is proposed,which aims to improve the accuracy of text line recognition in images.Firstly,after performing text region detection,a single long and curved text line is obtained and noise information is removed.Secondly,the key inflection points of each curved text line are calculated based on their features.Subsequently,the text lines are segmented and merged using the key inflection points.Finally,the segmented and merged text lines are fed into a text line recognition model to obtain the final recognition results.A comparative experiment is conducted between the manually collected dataset,Long Curve Text,and the state-of-the-art OCR frameworks,namely PP-OCR and Tesseract OCR.The experiments show improvements in the LA(Localization Accuracy),MED(Minimum Edit Distance),and NED(Normalized Edit Distance)metrics.Compared to PP-OCR,LA is improved by 49.5%,while MED and NED decrease by 44115 and 0.182,respectively.Compared to Tesseract OCR,LA is improved by 3.2%,while MED and NED decrease by 30282 and 0.125,respectively.Additionally,ablation experiments are performed on the Long Curve Text dataset to validate the effectiveness of LCTP,and time comparison experiments are conducted to demonstrate the efficiency of the proposed LCTP structures.The results indicate that LCTP can enhance the accuracy of long and curved text recognition,providing more precise recognition results in general.
作者
刘新天
冯杰
朱明航
马汉杰
郑雅羽
LIU Xintian;FENG Jie;ZHU Minghang;MA Hanjie;ZHENG Yayu(School of Computer Science and Technology(School of Artificial Intelligence),Zhejiang Sci-Tech University,Hangzhou 310018,China;College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)
出处
《智能计算机与应用》
2024年第12期10-17,共8页
Intelligent Computer and Applications
基金
浙江省科技计划项目(2021C01163)。
关键词
长弯曲文本
干扰信息
关键拐点
切分
融合
long curve text
noise information
key inflection points
segmented
merged