tesseract-ocrHow do I use Tesseract OCR to extract key-value pairs from an image?
Using Tesseract OCR to extract key-value pairs from an image is a relatively simple process. The following example code shows how to use the Tesseract OCR library to extract text from an image and then use a regular expression to extract key-value pairs.
import pytesseract
import re
# Read image
img = cv2.imread('image.png')
# Extract text from image
text = pytesseract.image_to_string(img)
# Extract key-value pairs from text
pairs = re.findall(r'(\w+)\s*:\s*(\w+)', text)
# Print pairs
print(pairs)
Output example
[('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')]
The code above does the following:
- Imports the
pytesseract
andre
libraries. - Reads the image from the file
image.png
. - Uses
pytesseract.image_to_string()
to extract the text from the image. - Uses a regular expression to extract key-value pairs from the text.
- Prints out the extracted pairs.
Helpful links
More of Tesseract Ocr
- How can I use Python to get the coordinates of words detected by Tesseract OCR?
- How do I add Tesseract OCR to my environment variables?
- How can I use Tesseract OCR with Xamarin Forms?
- How can I use UiPath to implement Tesseract OCR language processing?
- How do I set the Windows path for Tesseract OCR?
- How can I use Tesseract OCR with Golang?
- How do I install Tesseract OCR on Windows?
- How do tesseract ocr and easyocr compare in terms of accuracy and speed of text recognition?
- How can I use Tesseract OCR on Windows via the command line?
- How do I use Tesseract OCR?
See more codes...