期刊文献+

基于Tesseract的医学化验单内容识别技术 被引量:16

Recognition technology of the laboratory sheet based on Tesseract
在线阅读 下载PDF
导出
摘要 目的由于化验单内容可以真实地记录患者健康状态,因此将纸质的化验单转为医疗电子档案进行存储在进行保险理赔、转院、远程会诊、建立健康档案时都具有重要作用。但目前在临床上尚缺乏能识别化验单内容,把化验单直接转成医疗电子档案的工具,为此本文设计了一套完整的自动化医学化验单内容的光学字符识别(optical character recognition,OCR)方法。方法首先对化验单图像进行预处理,利用大津法对化验单图像进行二值化、用霍夫变换对图像进行抗扭斜和特征提取,然后使用Tesseract的集束搜索算法和 K 邻近算法对化验单内容进行识别,对字库进行训练,利用医学词典文件与模糊字文件来对识别内容进行纠错,并以此建立医学化验单OCR引擎。最后利用从上海某社区医院收集的302条化验单数据对OCR引擎的准确率进行了评估。结果经评估验证,本文方法的识别准确率为92.72%,可基本满足临床需求。结论基于Tesseract建立的医学化验单OCR引擎可以免去手动输入化验单数据的麻烦,医生仅需拍照上传化验单照片,即可将化验单中的内容转成结构化医疗电子档案,极大提高了医生的工作效率,有助于数据的进一步利用。 Objective As the contents of the laboratory sheet can truly record patients’ health status, it plays an important role to convert the paper laboratory sheet into medical electronic files for storage in insurance claims, transfer, remote consultation, and establishment of health records. However, there is no tool to identify the contents of laboratory sheet and convert the laboratory sheet directly into structured medical electronic files at present. For this reason, this paper designs a complete optical character recognition(OCR)identification methods for automatic identification of medical laboratory sheet. Methods First, the image of laboratory sheet was preprocessed, binarized by Otsu method. A deskew and feature extraction was performed by Hough transform, then the content of laboratory sheet was identified by Tesseract’s beam search algorithm and K-neighboring algorithm, the word bank was trained, and the recognition content was corrected by the medical dictionary file and the unicharambigs file. Based on this, an OCR engine for laboratory sheets was built. Finally, the accuracy of OCR engine was evaluated by using 302 laboratory sheets collected from a community hospital in Shanghai. Results The recognition accuracy of this method was 92.72%, which could basically meet the clinical needs. Conclusion The OCR engine based on Tesseract can avoid the trouble of manually inputting the laboratory sheet data. Doctors only need to take photos of laboratory sheets and upload these photos by internet, the OCR engine can transform the contents of the laboratory sheet into structured medical electronic files, which greatly improves the efficiency of doctors and helps to further use the data.
作者 张淙悦 尹梓名 孙大运 戴维 ZHANG Congyue;YIN Ziming;SUN Dayun;DAI Wei(School of Meical Instrument and Food Engineering,University of Shanghai for Science and Technology,Shanghai 200093)
出处 《北京生物医学工程》 2019年第3期283-289,共7页 Beijing Biomedical Engineering
基金 国家自然科学基金(81801797)资助
关键词 化验单 光学字符识别 图像处理 错误校正 laboratory sheet optical character recognition image processing error correction
  • 相关文献

参考文献4

二级参考文献29

  • 1张引.基于空间分布的最大类间方差牌照图像二值化算法[J].浙江大学学报(工学版),2001,35(2):219-219. 被引量:39
  • 2Kwok Bun Yue, Zahabia Damania. The use of free and open source software in real-world capstone projects [J]. Consortium for Computing Sciences in Colleges, 2011, 26 (4) : 85-92.
  • 3Keng Tan, Chai D. Designing a color barcode for mobile applications [J]. IEEE Pervasive Computing, 2012, 11 (2): 50-55.
  • 4OTSU N. A threshold selection method from gray-Level histo-grams [J]. IEEE Trans Syst, Man Cybern, SMC-9, 1979 (8): 62-66.
  • 5Derek Bradley. Adaptive thresholding using the integral image [J]. Journal of Graphics, GPU, and Game Tools, 2007, 12 (2) : 13-21.
  • 6NIBLACK W. An introduction to image processing [M]. NJ: Prentice-Hall, Englewood Cliffs, 1986: 115-116.
  • 7Sauvola J, Pietikainen M. Adaptive document image binarization [ J ]. Pattern Recognition,2000,33 ( 2 ) :225 - 236.
  • 8Sahoo P K, Soltani S, Wong A K C, et al. Survey of thresholding tech- niques[ J]. Computer Graphics, Vision and Image Processing, 1988 (41) :233 -260.
  • 9Trier, Jain Ak. Goal-directed evaluation of binarization methods [ J ]. IEEE Trans Pattern Analysis and Machine Intelligence, 1995,17 ( 12 ) : 1191 - 1201.
  • 10赵建蕾,王汇源,方颖.偏暗或泛白背景的车牌图像二值化方法[J].计算机工程,2008,34(6):210-211. 被引量:5

共引文献74

同被引文献149

引证文献16

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部