摘要
目的由于化验单内容可以真实地记录患者健康状态,因此将纸质的化验单转为医疗电子档案进行存储在进行保险理赔、转院、远程会诊、建立健康档案时都具有重要作用。但目前在临床上尚缺乏能识别化验单内容,把化验单直接转成医疗电子档案的工具,为此本文设计了一套完整的自动化医学化验单内容的光学字符识别(optical character recognition,OCR)方法。方法首先对化验单图像进行预处理,利用大津法对化验单图像进行二值化、用霍夫变换对图像进行抗扭斜和特征提取,然后使用Tesseract的集束搜索算法和 K 邻近算法对化验单内容进行识别,对字库进行训练,利用医学词典文件与模糊字文件来对识别内容进行纠错,并以此建立医学化验单OCR引擎。最后利用从上海某社区医院收集的302条化验单数据对OCR引擎的准确率进行了评估。结果经评估验证,本文方法的识别准确率为92.72%,可基本满足临床需求。结论基于Tesseract建立的医学化验单OCR引擎可以免去手动输入化验单数据的麻烦,医生仅需拍照上传化验单照片,即可将化验单中的内容转成结构化医疗电子档案,极大提高了医生的工作效率,有助于数据的进一步利用。
Objective As the contents of the laboratory sheet can truly record patients’ health status, it plays an important role to convert the paper laboratory sheet into medical electronic files for storage in insurance claims, transfer, remote consultation, and establishment of health records. However, there is no tool to identify the contents of laboratory sheet and convert the laboratory sheet directly into structured medical electronic files at present. For this reason, this paper designs a complete optical character recognition(OCR)identification methods for automatic identification of medical laboratory sheet. Methods First, the image of laboratory sheet was preprocessed, binarized by Otsu method. A deskew and feature extraction was performed by Hough transform, then the content of laboratory sheet was identified by Tesseract’s beam search algorithm and K-neighboring algorithm, the word bank was trained, and the recognition content was corrected by the medical dictionary file and the unicharambigs file. Based on this, an OCR engine for laboratory sheets was built. Finally, the accuracy of OCR engine was evaluated by using 302 laboratory sheets collected from a community hospital in Shanghai. Results The recognition accuracy of this method was 92.72%, which could basically meet the clinical needs. Conclusion The OCR engine based on Tesseract can avoid the trouble of manually inputting the laboratory sheet data. Doctors only need to take photos of laboratory sheets and upload these photos by internet, the OCR engine can transform the contents of the laboratory sheet into structured medical electronic files, which greatly improves the efficiency of doctors and helps to further use the data.
作者
张淙悦
尹梓名
孙大运
戴维
ZHANG Congyue;YIN Ziming;SUN Dayun;DAI Wei(School of Meical Instrument and Food Engineering,University of Shanghai for Science and Technology,Shanghai 200093)
出处
《北京生物医学工程》
2019年第3期283-289,共7页
Beijing Biomedical Engineering
基金
国家自然科学基金(81801797)资助
关键词
化验单
光学字符识别
图像处理
错误校正
laboratory sheet
optical character recognition
image processing
error correction