tesseract-ocrHow do I use Tesseract OCR to extract key-value pairs from an image?

Using Tesseract OCR to extract key-value pairs from an image is a relatively simple process. The following example code shows how to use the Tesseract OCR library to extract text from an image and then use a regular expression to extract key-value pairs.

import pytesseract
import re

# Read image
img = cv2.imread('image.png')

# Extract text from image
text = pytesseract.image_to_string(img)

# Extract key-value pairs from text
pairs = re.findall(r'(\w+)\s*:\s*(\w+)', text)

# Print pairs
print(pairs)

Output example

[('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')]

The code above does the following:

Imports the pytesseract and re libraries.
Reads the image from the file image.png.
Uses pytesseract.image_to_string() to extract the text from the image.
Uses a regular expression to extract key-value pairs from the text.
Prints out the extracted pairs.

Helpful links

Edit this code on GitHub

More of Tesseract Ocr

See more codes...