tesseract-ocrHow do I configure the output format of tesseract OCR?
# include tesseract library
import tesseract
# set output format to hocr
api = tesseract.TessBaseAPI()
api.SetPageSegMode(tesseract.PSM_AUTO)
api.SetOutputFormat(tesseract.RIL_HOCR)
# run tesseract with image file
api.SetImageFile('my_image.png')
api.Recognize()
# get output
text = api.GetUTF8Text()
# print output
print(text)
The above example code will configure the output format of tesseract OCR to hOCR (HTML-based Open Document Format for the Recognition of Text). It will also run tesseract with an image file my_image.png
and print the output.
The code consists of the following parts:
import tesseract
: This imports the tesseract library.api = tesseract.TessBaseAPI()
: This creates an instance of the TessBaseAPI class.api.SetPageSegMode(tesseract.PSM_AUTO)
: This sets the page segmentation mode to auto.api.SetOutputFormat(tesseract.RIL_HOCR)
: This sets the output format to hOCR.api.SetImageFile('my_image.png')
: This sets the image file to the given file.api.Recognize()
: This runs tesseract on the given image file.text = api.GetUTF8Text()
: This gets the output in UTF-8 encoded text.print(text)
: This prints the output.
For more information on configuring the output format of tesseract OCR, please refer to the following links:
More of Tesseract Ocr
- How can I use Tesseract OCR with Xamarin Forms?
- How do I add Tesseract OCR to my environment variables?
- How can I use Python to get the coordinates of words detected by Tesseract OCR?
- How do I set the Windows path for Tesseract OCR?
- How do I install Tesseract OCR on Windows?
- How can I integrate Tesseract OCR into a Unity project?
- How can I use Tesseract OCR with Kubernetes?
- How do I install Tesseract-OCR using Yum?
- How do tesseract ocr and easyocr compare in terms of accuracy and speed of text recognition?
- How can I use Tesseract OCR with Xamarin?
See more codes...