tesseract · 我的工作笔记

# orc图像文字识别 > 来源: <https://zhuanlan.zhihu.com/p/38451718> > > 项目地址: <https://github.com/tesseract-ocr/tesseract> 识别英文基本上100%准确率, 中文乱码是没有安装训练库其他语言下载: <https://github.com/tesseract-ocr/tessdata> #### 安装 windows ``` https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.0.0-beta.1.20180608.exe ``` ubuntu/debian/deepin ```bash $ apt-get install tesseract-ocr ``` py库 ```bash $ pip3 install pytesseract ``` #### 使用 - 命令 ``` tesseract 图片文件输出内容目标文件名 ``` - python ```python import pytesseract from PIL import Image # pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe' text = pytesseract.image_to_string(Image.open('0.png')) print(text) ```