9951 explained code solutions for 126 technologies


tesseract-ocrHow do I create a traineddata file for Tesseract OCR?


Creating a traineddata file for Tesseract OCR requires a few steps:

  1. Generate a font_properties file. This file contains information about the font family, font style, font weight, font size, and font language.
familyname fontname bold italic size language

## Example

Roboto Regular normal normal 48 eng
  1. Generate a box file. This file contains information about the characters in the font. It is generated from the font_properties file.
tesseract fontname.font_properties fontname.box

## Example

tesseract Roboto.font_properties Roboto.box
  1. Generate the traineddata file. This file is generated from the box file.
combine_tessdata -e fontname.traineddata fontname.

## Example

combine_tessdata -e Roboto.traineddata Roboto.
  1. Test the traineddata file. This step is optional, but it is recommended to ensure that the traineddata file is working properly.
tesseract --tessdata-dir . fontname.exp0.tif fontname.exp0 -l fontname

## Example

tesseract --tessdata-dir . Roboto.exp0.tif Roboto.exp0 -l Roboto

The output should be a text file containing the text from the image.

Helpful links

Edit this code on GitHub