9951 explained code solutions for 126 technologies


tesseract-ocrHow do I install and use Tesseract OCR on Ubuntu?


  1. Install Tesseract OCR on Ubuntu:

    sudo apt-get install tesseract-ocr
  2. Install additional language packs:

    sudo apt-get install tesseract-ocr-<lang>

    Replace <lang> with the two-letter code for the language you want to use. For example, to install the English language pack, use sudo apt-get install tesseract-ocr-eng.

  3. Test Tesseract OCR:

    tesseract --version

    Output:

    tesseract 4.1.1
    leptonica-1.78.0
    libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
  4. Run Tesseract OCR on an image:

    tesseract image.png output

    This will create a text file output.txt with the OCR result.

  5. Improve Tesseract OCR accuracy: You can improve the accuracy of Tesseract OCR by providing a training data file (<lang>.traineddata) for the language you are using. You can find these files on the Tesseract OCR GitHub page.

  6. Use Tesseract OCR from a programming language: Tesseract OCR can be used from a variety of programming languages, including Python, Java, and C++. You can find instructions for using Tesseract OCR from each of these languages on the Tesseract OCR Wiki page.

  7. Further reading:

Edit this code on GitHub