tesseract-ocrHow do I use Tesseract OCR to extract key-value pairs from an image?
Using Tesseract OCR to extract key-value pairs from an image is a relatively simple process. The following example code shows how to use the Tesseract OCR library to extract text from an image and then use a regular expression to extract key-value pairs.
import pytesseract
import re
# Read image
img = cv2.imread('image.png')
# Extract text from image
text = pytesseract.image_to_string(img)
# Extract key-value pairs from text
pairs = re.findall(r'(\w+)\s*:\s*(\w+)', text)
# Print pairs
print(pairs)
Output example
[('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')]
The code above does the following:
- Imports the
pytesseractandrelibraries. - Reads the image from the file
image.png. - Uses
pytesseract.image_to_string()to extract the text from the image. - Uses a regular expression to extract key-value pairs from the text.
- Prints out the extracted pairs.
Helpful links
More of Tesseract Ocr
- How can I use Tesseract to perform zonal OCR?
- How can I identify and mitigate potential vulnerabilities in Tesseract OCR?
- How do I install and use language packs with Tesseract OCR?
- How do I set the Windows path for Tesseract OCR?
- How can I integrate Tesseract OCR into a Unity project?
- How can I tune Tesseract OCR for optimal accuracy?
- How can I use Tesseract OCR with Spring Boot?
- How do I install Tesseract-OCR using Yum?
- How can I use Tesseract OCR with Xamarin?
- How can I use Tesseract OCR in a PHP project?
See more codes...