9951 explained code solutions for 126 technologies


tesseract-ocrHow do I use Tesseract OCR to recognize number plates?


Tesseract OCR is a powerful optical character recognition (OCR) library used to extract text from images. It can be used to recognize number plates by first pre-processing the image to make it easier for Tesseract to detect the characters. This pre-processing can include converting the image to grayscale, blurring, thresholding, and morphological operations.

For example, the following code block uses the OpenCV library to convert the image to grayscale, blur it, and apply a threshold:

import cv2

# Read the image
image = cv2.imread('image.jpg')

# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Blur the image
blur = cv2.GaussianBlur(gray, (5,5), 0)

# Apply threshold
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

After pre-processing, Tesseract can be used to detect the characters on the number plate. Tesseract can be installed using the PyPi package, and the following code block can be used to detect the characters:

import pytesseract

# Run tesseract OCR on the pre-processed image
text = pytesseract.image_to_string(thresh)

# Print recognized characters
print(text)

# Output: XYZ-12345

Code explanation

  1. import cv2: imports the OpenCV library.
  2. image = cv2.imread('image.jpg'): reads the image.
  3. gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY): converts the image to grayscale.
  4. blur = cv2.GaussianBlur(gray, (5,5), 0): blurs the image.
  5. thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]: applies a threshold to the image.
  6. import pytesseract: imports the Tesseract library.
  7. text = pytesseract.image_to_string(thresh): runs Tesseract OCR on the pre-processed image.
  8. print(text): prints the recognized characters.

Edit this code on GitHub