9951 explained code solutions for 126 technologies


tesseract-ocrHow can I use Tesseract OCR to recognize multiple languages?


Tesseract OCR is an open source optical character recognition (OCR) library that can be used to recognize multiple languages. It supports over 100 languages and can be used for a wide variety of applications. Here is an example of how to use Tesseract OCR to recognize multiple languages:

# Import the necessary libraries
from PIL import Image
import pytesseract

# Read the image file
image = Image.open('image.jpg')

# Set the language to be recognized
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
text = pytesseract.image_to_string(image, lang='eng+spa')

# Print the recognized text
print(text)

The code above will output the recognized text in both English and Spanish. It is important to note that the lang parameter should be set to the languages you want to recognize. You can also specify multiple languages by separating them with a + sign. For example, lang='eng+spa+deu' will recognize text in English, Spanish, and German.

The following are the parts of the code and their explanations:

  1. from PIL import Image - This imports the Image module from the PIL library, which is used for manipulating images.
  2. import pytesseract - This imports the pytesseract library, which is used to call the Tesseract OCR engine.
  3. image = Image.open('image.jpg') - This reads the image file.
  4. pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" - This sets the location of the Tesseract OCR engine.
  5. text = pytesseract.image_to_string(image, lang='eng+spa') - This calls the Tesseract OCR engine with the specified language.
  6. print(text) - This prints the recognized text.

For more information on Tesseract OCR and how to use it, please refer to the following links:

Edit this code on GitHub