In this article, we want to teach you How To Set up Tesseract OCR on Debian 11.
Tesseract is an optical character recognition engine with open-source code, this is the most popular and qualitative OCR library.
OCR uses artificial intelligence for text search and its recognition of images.
Tesseract is finding templates in pixels, letters, words, and sentences. It uses a two-step approach that calls adaptive recognition. It requires one data stage for character recognition, then the second stage to fulfill any letters, it wasn’t insured in, by letters that can match the word or sentence context.
How To Set up Tesseract OCR on Debian 11
Before you start to set up Tesseract OCR on your server, you need to log in to your server as a non-root user with sudo privileges. To do this, you can follow our article the Initial Server Setup with Debian 11.
Then, follow the steps below to install Tesseract OCR on Debian 11.
Install Tesseract OCR on Debian 11 with APT
By default, Tesseract OCR is available in the default Debian repository.
First, you need to update your local package index with the following command:
sudo apt update -y
Then, use the command below to install Tesseract OCR on Debian 11:
sudo apt install tesseract-ocr
Tesseract will install under /usr/share/tesseract-ocr/4.00/tessdata.
The convert command is useful for converting between image formats and resizing an image, blurring, cropping, despeckling, dithering, drawing on, fliping, joining, re-sampling and more.
This tool is provided by Imagemagick. To install it run the command below:
sudo apt install imagemagick
Here you can test your Tesseract.
To do this, find an image containing the text and then execute the following command:
tesseract <image_name> <output file_name>
Tesseract extracts text from the image. To work with Tesseract, you just need to create word count documents. You have to train it to understand the handwriting.
Install Tesseract OCR from the source
Another way to install Tesseract OCR is to get it from the source. You can use it in all Linux distros.
Install the required packages on Debian 11 with the command below:
sudo apt install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config libpango1.0-dev
Get Tesseract OCR with the command below:
git clone https://github.com/tesseract-ocr/tesseract.git
Then, switch to your Tesseract directory with the command below:
Now you need to create the Tesseract installation files on Debian 11. To do this, run the script below:
Here you can start your installation process by running the script below:
When you are done, start compiling Tesseract on Debian 11 with the following command:
This will take some time to complete.
Next, run the following command:
sudo make install
Enter Idconfig command:
Now you need to compile the training tools. To do this, run the following command:
sudo make training
Finally, run the following command:
sudo make training-install
For more information, you can visit the Tesseract OCR documentation.
At this point, you learn to Set up Tesseract OCR on Debian 11.
Hope you enjoy using it.