适合小白的几个入门级Python ocr识别库

适合小白的几个入门级Python ocr识别库_起不好名字就不起了的博客-CSDN博客_ddddocr库

工作生活中经常会遇到需要提取图片中文字信息的情况，以前都是手动自己把图片里的字敲出来，但随着这几年人工智能技术的愈发成熟，市面上有越来越多的ocr产品了，基本上能大部分正常图片的文字提取需求。当然有时候需要提取文字的图片数量较多或者有某个应用程序编写需求时，就需要借助代码来实现了，这里介绍几个比较适合新手小白的python ocr库，简单实用，可满足绝大多数常规的图片文字提取、验证码识别需求。

pytesseract需要配合安装在本地的tesseract-ocr.exe文件一起使用，tesseract-ocr.exe安装教程可参考这里：Tesseract Ocr文字识别，需要注意的是安装时一定要选中中文包，默认是只支持英文识别。

python库安装命令如下：

pip install pytesseract

1	pip install pytesseract

待识别图片如下：
在这里插入图片描述
代码实现：

import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open(r"d:\Desktop\39DEE621-40EA-4ad1-90CC-79EB51D39347.png"))
print(text)

import pytesseract

from PIL import Image

text = pytesseract.image_to_string(Image.open(r"d:\Desktop\39DEE621-40EA-4ad1-90CC-79EB51D39347.png"))

print(text)

识别结果输出：

Using Tesseract OCR with Python
from PIL import Image
import pytesseract
import ergperse
import cv2
import os

ap = argparse.ArgunentParser()
ap.add_argument("-i", "--image", required-True,
help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", typesstr, default="thresh",
helpe"type of preprocessing to be done")
args = vars (ap.parse_args())

Using Tesseract OCR with Python

from PIL import Image

import pytesseract

import ergperse

import cv2

import os

ap = argparse.ArgunentParser()

ap.add_argument("-i", "--image", required-True,

help="path to input image to be OCR'd")

ap.add_argument("-p", "--preprocess", typesstr, default="thresh",

helpe"type of preprocessing to be done")

args = vars (ap.parse_args())

PaddleOCR是百度开源的一款基于深度学习的ocr识别库，对中文的识别精度相当不错，可以应付绝大多数的文字提取需求。

需要依次安装三个依赖库，安装命令如下，其中shapely库可能会受系统影响安装报错，具体解决方案参考这篇博客：百度OCR（文字识别）服务使用入坑指南

pip install paddlepaddle
pip install shapely
pip install paddleocr

pip install paddlepaddle

pip install shapely

pip install paddleocr

待识别图片如下：
在这里插入图片描述
代码实现：

    ocr = PaddleOCR(use_angle_cls=True, lang="ch")
    
    img_path = r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png"
    
    result = ocr.ocr(img_path, cls=True)
    for line in result:
        print(line)

    from PIL import Image
    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores)
    im_show = Image.fromarray(im_show)
    im_show.show()

ocr = PaddleOCR(use_angle_cls=True, lang="ch")

img_path = r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png"

result = ocr.ocr(img_path, cls=True)

for line in result:

print(line)

from PIL import Image

image = Image.open(img_path).convert('RGB')

boxes = [line[0] for line in result]

txts = [line[1][0] for line in result]

scores = [line[1][1] for line in result]

im_show = draw_ocr(image, boxes, txts, scores)

im_show = Image.fromarray(im_show)

im_show.show()

识别结果输出如下，会显示出每个区域字体识别的置信度，以及其坐标位置信息：
在这里插入图片描述

Namespace(cls=False, cls_batch_num=30, cls_image_shape='3, 48, 192', cls_model_dir='C:\\Users\\Administrator/.paddleocr/cls', cls_thresh=0.9, det=True, det_algorithm='DB', det_db_box_thresh=0.5, det_db_thresh=0.3, det_db_unclip_ratio=2.0, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_max_side_len=960, det_model_dir='C:\\Users\\Administrator/.paddleocr/det', enable_mkldnn=False, gpu_mem=8000, image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', max_text_length=25, rec=True, rec_algorithm='CRNN', rec_batch_num=30, rec_char_dict_path='./ppocr/utils/ppocr_keys_v1.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='C:\\Users\\Administrator/.paddleocr/rec/ch', use_angle_cls=True, use_gpu=True, use_space_char=True, use_tensorrt=False, use_zero_copy_run=False)
dt_boxes num : 16, elapse : 0.04799485206604004
cls num  : 16, elapse : 0.1860027313232422
rec_res num  : 16, elapse : 0.4859299659729004
[[[6.0, 2.0], [85.0, 2.0], [85.0, 31.0], [6.0, 31.0]], ['帮助文档', 0.99493873]]
[[[309.0, 13.0], [324.0, 13.0], [324.0, 28.0], [309.0, 28.0]], ['X', 0.9667116]]
[[[82.0, 50.0], [120.0, 50.0], [120.0, 71.0], [82.0, 71.0]], ['目录', 0.993418]]
[[[136.0, 50.0], [176.0, 50.0], [176.0, 71.0], [136.0, 71.0]], ['标题', 0.99969745]]
[[[13.0, 53.0], [60.0, 53.0], [60.0, 70.0], [13.0, 70.0]], ['快捷键', 0.9995322]]
[[[191.0, 49.0], [314.0, 49.0], [314.0, 72.0], [191.0, 72.0]], ['文本样式列表', 0.9967863]]
[[[61.0, 84.0], [120.0, 84.0], [120.0, 101.0], [61.0, 101.0]], ['代码片', 0.9997086]]
[[[134.0, 81.0], [181.0, 84.0], [180.0, 104.0], [132.0, 101.0]], ['表格', 0.9891155]]
[[[187.0, 84.0], [232.0, 84.0], [232.0, 101.0], [187.0, 101.0]], ['注脚', 0.99958]]
[[[13.0, 115.0], [90.0, 115.0], [90.0, 135.0], [13.0, 135.0]], ['自定义列表', 0.99823236]]
[[[109.0, 115.0], [219.0, 115.0], [219.0, 135.0], [109.0, 135.0]], ['LaTeX数学公式', 0.98812836]]
[[[237.0, 115.0], [315.0, 115.0], [315.0, 135.0], [237.0, 135.0]], ['插入甘特图', 0.9982792]]
[[[12.0, 148.0], [94.0, 148.0], [94.0, 167.0], [12.0, 167.0]], ['插入UML图', 0.9926085]]
[[[113.0, 148.0], [249.0, 148.0], [249.0, 167.0], [113.0, 167.0]], ['插入Mermaid流程图', 0.996088]]
[[[11.0, 176.0], [153.0, 176.0], [153.0, 200.0], [11.0, 200.0]], ['插入Flowchart流程图', 0.9780351]]
[[[174.0, 179.0], [237.0, 179.0], [237.0, 200.0], [174.0, 200.0]], ['插入类图', 0.9519753]]

Namespace(cls=False, cls_batch_num=30, cls_image_shape='3, 48, 192', cls_model_dir='C:\\Users\\Administrator/.paddleocr/cls', cls_thresh=0.9, det=True, det_algorithm='DB', det_db_box_thresh=0.5, det_db_thresh=0.3, det_db_unclip_ratio=2.0, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_max_side_len=960, det_model_dir='C:\\Users\\Administrator/.paddleocr/det', enable_mkldnn=False, gpu_mem=8000, image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', max_text_length=25, rec=True, rec_algorithm='CRNN', rec_batch_num=30, rec_char_dict_path='./ppocr/utils/ppocr_keys_v1.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='C:\\Users\\Administrator/.paddleocr/rec/ch', use_angle_cls=True, use_gpu=True, use_space_char=True, use_tensorrt=False, use_zero_copy_run=False)

dt_boxes num : 16, elapse : 0.04799485206604004

cls num : 16, elapse : 0.1860027313232422

rec_res num : 16, elapse : 0.4859299659729004

[[[6.0, 2.0], [85.0, 2.0], [85.0, 31.0], [6.0, 31.0]], ['帮助文档', 0.99493873]]

[[[309.0, 13.0], [324.0, 13.0], [324.0, 28.0], [309.0, 28.0]], ['X', 0.9667116]]

[[[82.0, 50.0], [120.0, 50.0], [120.0, 71.0], [82.0, 71.0]], ['目录', 0.993418]]

[[[136.0, 50.0], [176.0, 50.0], [176.0, 71.0], [136.0, 71.0]], ['标题', 0.99969745]]

[[[13.0, 53.0], [60.0, 53.0], [60.0, 70.0], [13.0, 70.0]], ['快捷键', 0.9995322]]

[[[191.0, 49.0], [314.0, 49.0], [314.0, 72.0], [191.0, 72.0]], ['文本样式列表', 0.9967863]]

[[[61.0, 84.0], [120.0, 84.0], [120.0, 101.0], [61.0, 101.0]], ['代码片', 0.9997086]]

[[[134.0, 81.0], [181.0, 84.0], [180.0, 104.0], [132.0, 101.0]], ['表格', 0.9891155]]

[[[187.0, 84.0], [232.0, 84.0], [232.0, 101.0], [187.0, 101.0]], ['注脚', 0.99958]]

[[[13.0, 115.0], [90.0, 115.0], [90.0, 135.0], [13.0, 135.0]], ['自定义列表', 0.99823236]]

[[[109.0, 115.0], [219.0, 115.0], [219.0, 135.0], [109.0, 135.0]], ['LaTeX数学公式', 0.98812836]]

[[[237.0, 115.0], [315.0, 115.0], [315.0, 135.0], [237.0, 135.0]], ['插入甘特图', 0.9982792]]

[[[12.0, 148.0], [94.0, 148.0], [94.0, 167.0], [12.0, 167.0]], ['插入UML图', 0.9926085]]

[[[113.0, 148.0], [249.0, 148.0], [249.0, 167.0], [113.0, 167.0]], ['插入Mermaid流程图', 0.996088]]

[[[11.0, 176.0], [153.0, 176.0], [153.0, 200.0], [11.0, 200.0]], ['插入Flowchart流程图', 0.9780351]]

[[[174.0, 179.0], [237.0, 179.0], [237.0, 200.0], [174.0, 200.0]], ['插入类图', 0.9519753]]

github上一万多个star的开源ocr项目（github地址：EasyOCR），支持80多种语言的识别，识别精度超高。

python库安装命令如下：

pip install easyocr

1	pip install easyocr

待识别图片如下：
在这里插入图片描述
代码实现：

import easyocr

reader = easyocr.Reader(['ch_sim','en'], gpu = False) 
result = reader.readtext(r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png", detail = 0)
print(result)

import easyocr

reader = easyocr.Reader(['ch_sim','en'], gpu = False)

result = reader.readtext(r"d:\Desktop\4A34A16F-6B12-4ffc-88C6-FC86E4DF6912.png", detail = 0)

print(result)

初次运行需要在线下载检测模型和识别模型，建议在网速好点的环境运行：

Using CPU. Note: This module is much faster with a GPU.
Downloading detection model, please wait. This may take several minutes depending upon your network connection.
Downloading recognition model, please wait. This may take several minutes depending upon your network connection.

Using CPU. Note: This module is much faster with a GPU.

Downloading detection model, please wait. This may take several minutes depending upon your network connection.

Downloading recognition model, please wait. This may take several minutes depending upon your network connection.

识别结果输出如下，没有遗漏任何一个文字，精度甚至要优于前面的PaddleOCR：

<span class="token punctuation">[</span><span class="token string">'帮助文档'</span><span class="token punctuation">,</span> <span class="token string">'快捷键'</span><span class="token punctuation">,</span> <span class="token string">'目录'</span><span class="token punctuation">,</span> <span class="token string">'标题'</span><span class="token punctuation">,</span> <span class="token string">'文本样式'</span><span class="token punctuation">,</span> <span class="token string">'列表'</span><span class="token punctuation">,</span> <span class="token string">'链接'</span><span class="token punctuation">,</span> <span class="token string">'代码片'</span><span class="token punctuation">,</span> <span class="token string">'表格'</span><span class="token punctuation">,</span> <span class="token string">'注脚'</span><span class="token punctuation">,</span> <span class="token string">'注释'</span><span class="token punctuation">,</span> <span class="token string">'自定义列表'</span><span class="token punctuation">,</span> <span class="token string">'LaTex 数学公式'</span><span class="token punctuation">,</span> <span class="token string">'插入甘犄图'</span><span class="token punctuation">,</span> <span class="token string">'插入UML图'</span><span class="token punctuation">,</span> <span class="token string">'插入Mernaid流程图'</span><span class="token punctuation">,</span> <span class="token string">'插入 Flowchart流程图'</span><span class="token punctuation">,</span> <span class="token string">'插入类图'</span><span class="token punctuation">]</span>

['帮助文档', '快捷键', '目录', '标题', '文本样式', '列表', '链接', '代码片', '表格', '注脚', '注释', '自定义列表', 'LaTex 数学公式', '插入甘犄图', '插入UML图', '插入Mernaid流程图', '插入 Flowchart流程图', '插入类图']

muggle_ocr是一款轻量级的ocr识别库，从名字也可以看出来，专为麻瓜设计！使用也非常简单，但其强项主要是用于识别各类验证码，一般文字提取效果就稍差了。

python库安装命令如下：

pip install muggle_ocr

1	pip install muggle_ocr

待识别验证码如下：
在这里插入图片描述

代码实现：

import muggle_ocr


sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)

with open(r"d:\Desktop\四位验证码.png", "rb") as f:
img = f.read()

text = sdk.predict(image_bytes=img)
print(text)

import muggle_ocr

sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)

with open(r"d:\Desktop\四位验证码.png", "rb") as f:

img = f.read()

text = sdk.predict(image_bytes=img)

print(text)

识别结果输出如下：

MuggleOCR Session <span class="token punctuation">[</span>captcha<span class="token punctuation">]</span> Loaded<span class="token punctuation">.</span>
3n3d

MuggleOCR Session [captcha] Loaded.

3n3d

dddd_ocr也是一个用于识别验证码的开源库，又名带带弟弟ocr，爬虫界大佬sml2h3开发，识别效果也是非常不错，对一些常规的数字、字母验证码识别有奇效。

python库安装命令如下：

pip install dddd_ocr

1 2	pip install dddd_ocr

待识别验证码如下：
在这里插入图片描述

代码实现：

import ddddocr

ocr = ddddocr.DdddOcr()

with open("d:\Desktop\四位验证码2.png", 'rb') as f:

img_bytes = f.read()

res = ocr.classification(img_bytes)

print(res)