9951 explained code solutions for 126 technologies


tesseract-ocrHow can I use Tesseract OCR to extract text from an image?


Tesseract OCR is an open source Optical Character Recognition (OCR) engine. It can be used to extract text from an image. To use Tesseract OCR to extract text from an image, the following steps need to be followed:

  1. Install Tesseract OCR on your computer.
  2. Import the PyTesseract module.
    import pytesseract
  3. Provide an image file path to the image_to_string() function.
    text = pytesseract.image_to_string('image.jpg')
    print(text)

    Output example

This is some example text
  1. Set the language of the text in the image, if necessary.
    text = pytesseract.image_to_string('image.jpg', lang='eng')
  2. Set the OCR engine mode, if necessary.
    text = pytesseract.image_to_string('image.jpg', lang='eng', oem=1)
  3. Get the text from the image.
    text = pytesseract.image_to_string('image.jpg')
  4. Print the extracted text.
    print(text)

    Output example

This is some example text

Helpful links

Edit this code on GitHub