tesseract-ocrHow do I configure the output format of tesseract OCR?
# include tesseract library
import tesseract
# set output format to hocr
api = tesseract.TessBaseAPI()
api.SetPageSegMode(tesseract.PSM_AUTO)
api.SetOutputFormat(tesseract.RIL_HOCR)
# run tesseract with image file
api.SetImageFile('my_image.png')
api.Recognize()
# get output
text = api.GetUTF8Text()
# print output
print(text)
The above example code will configure the output format of tesseract OCR to hOCR (HTML-based Open Document Format for the Recognition of Text). It will also run tesseract with an image file my_image.png and print the output.
The code consists of the following parts:
import tesseract: This imports the tesseract library.api = tesseract.TessBaseAPI(): This creates an instance of the TessBaseAPI class.api.SetPageSegMode(tesseract.PSM_AUTO): This sets the page segmentation mode to auto.api.SetOutputFormat(tesseract.RIL_HOCR): This sets the output format to hOCR.api.SetImageFile('my_image.png'): This sets the image file to the given file.api.Recognize(): This runs tesseract on the given image file.text = api.GetUTF8Text(): This gets the output in UTF-8 encoded text.print(text): This prints the output.
For more information on configuring the output format of tesseract OCR, please refer to the following links:
More of Tesseract Ocr
- How do I install Tesseract-OCR using Yum?
- How can I use Tesseract OCR with Xamarin Forms?
- How can I use Python to get the coordinates of words detected by Tesseract OCR?
- How can I use Tesseract to perform zonal OCR?
- How can I use Tesseract OCR on Windows via the command line?
- How do I use Tesseract OCR to extract text from a ZIP file?
- How do I set the Windows path for Tesseract OCR?
- How can I use Tesseract OCR with Golang?
- How do tesseract ocr and easyocr compare in terms of accuracy and speed of text recognition?
- How can I use Tesseract OCR to recognize numbers only?
See more codes...