9951 explained code solutions for 126 technologies


tesseract-ocrHow do I use the Tesseract OCR engine with Python?


Tesseract OCR engine can be used with Python through the pytesseract package. This package provides a wrapper for the Tesseract API, allowing us to use it with Python. To install pytesseract, use the following command:

pip install pytesseract

Once installed, you can use the Tesseract engine with Python by following the example code below:

import pytesseract
from PIL import Image

image = Image.open('example.png')
text = pytesseract.image_to_string(image)
print(text)

This code will open the example.png image, feed it to the Tesseract engine, and print the extracted text.

The code consists of the following parts:

  1. import pytesseract: Imports the pytesseract package to access the Tesseract engine.
  2. from PIL import Image: Imports the Image module from the Python Imaging Library (PIL) package to open the image.
  3. image = Image.open('example.png'): Opens the image with the file name example.png.
  4. text = pytesseract.image_to_string(image): Feeds the opened image to the Tesseract engine and stores the extracted text in the text variable.
  5. print(text): Prints the extracted text.

For more information, please refer to the official pytesseract documentation: https://pypi.org/project/pytesseract/

Edit this code on GitHub