9951 explained code solutions for 126 technologies


tesseract-ocrHow can I use Tesseract OCR in a Java application?


Tesseract OCR is an open source optical character recognition library written in C++. It can be used in Java applications by using a Java wrapper such as Tess4J. This wrapper provides a simple interface to the Tesseract API.

Example code using Tess4J:

// Import the Tess4J library
import net.sourceforge.tess4j.*;

public class MyClass {
    public static void main(String[] args) {
        // Create an instance of Tesseract
        Tesseract instance = new Tesseract();
        // Set the path to the tessdata folder
        instance.setDatapath("/path/to/tessdata");
        // Set the language
        instance.setLanguage("eng");
        // Recognize text from image
        try {
            String result = instance.doOCR(new File("/path/to/image"));
            System.out.println(result);
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }
    }
}

The code above will output the text recognized from the image provided.

The code consists of the following parts:

  • Import the Tess4J library: import net.sourceforge.tess4j.*;
  • Create an instance of Tesseract: Tesseract instance = new Tesseract();
  • Set the path to the tessdata folder: instance.setDatapath("/path/to/tessdata");
  • Set the language: instance.setLanguage("eng");
  • Recognize text from image: String result = instance.doOCR(new File("/path/to/image"));
  • Output the result: System.out.println(result);

Helpful links

Edit this code on GitHub