tesseract-ocrHow do I use Tesseract OCR to extract key-value pairs from an image?
Using Tesseract OCR to extract key-value pairs from an image is a relatively simple process. The following example code shows how to use the Tesseract OCR library to extract text from an image and then use a regular expression to extract key-value pairs.
import pytesseract
import re
# Read image
img = cv2.imread('image.png')
# Extract text from image
text = pytesseract.image_to_string(img)
# Extract key-value pairs from text
pairs = re.findall(r'(\w+)\s*:\s*(\w+)', text)
# Print pairs
print(pairs)
Output example
[('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')]
The code above does the following:
- Imports the
pytesseract
andre
libraries. - Reads the image from the file
image.png
. - Uses
pytesseract.image_to_string()
to extract the text from the image. - Uses a regular expression to extract key-value pairs from the text.
- Prints out the extracted pairs.
Helpful links
More of Tesseract Ocr
- How can I test Tesseract OCR online?
- How can I use tesseract ocr portable to recognize text in images?
- How do I find the official website for Tesseract OCR?
- How to use Tesseract OCR to recognize numbers?
- How can I use Tesseract OCR to recognize math formulas?
- How can I use Tesseract OCR with VBA?
- How can I integrate Tesseract OCR into a Unity project?
- How can I use Tesseract to perform zonal OCR?
- How do I use Tesseract OCR to extract text from a ZIP file?
- How do I add Tesseract OCR to my environment variables?
See more codes...