How To Set up Tesseract OCR on Debian 11

In this article, we want to teach you How To Set up Tesseract OCR on Debian 11.

Tesseract is an optical character recognition engine with open-source code, this is the most popular and qualitative OCR library.

OCR uses artificial intelligence for text search and its recognition of images.

Tesseract is finding templates in pixels, letters, words, and sentences. It uses a two-step approach that calls adaptive recognition. It requires one data stage for character recognition, then the second stage to fulfill any letters, it wasn’t insured in, by letters that can match the word or sentence context.

How To Set up Tesseract OCR on Debian 11

Before you start to set up Tesseract OCR on your server, you need to log in to your server as a non-root user with sudo privileges. To do this, you can follow our article the Initial Server Setup with Debian 11.

Then, follow the steps below to install Tesseract OCR on Debian 11.

Install Tesseract OCR on Debian 11 with APT

By default, Tesseract OCR is available in the default Debian repository.

First, you need to update your local package index with the following command:

sudo apt update -y

Then, use the command below to install Tesseract OCR on Debian 11:

sudo apt install tesseract-ocr

Tesseract will install under /usr/share/tesseract-ocr/4.00/tessdata.

The convert command is useful for converting between image formats and resizing an image, blurring, cropping, despeckling, dithering, drawing on, fliping, joining, re-sampling and more.

This tool is provided by Imagemagick. To install it run the command below:

sudo apt install imagemagick

Here you can test your Tesseract.

To do this, find an image containing the text and then execute the following command:

tesseract <image_name> <output file_name>

Tesseract extracts text from the image. To work with Tesseract, you just need to create word count documents. You have to train it to understand the handwriting.

Install Tesseract OCR from the source

Another way to install Tesseract OCR is to get it from the source. You can use it in all Linux distros.

Install the required packages on Debian 11 with the command below:

sudo apt install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config libpango1.0-dev

Get Tesseract OCR with the command below:

git clone https://github.com/tesseract-ocr/tesseract.git

Then, switch to your Tesseract directory with the command below:

cd tesseract

Now you need to create the Tesseract installation files on Debian 11. To do this, run the script below:

sudo ./autogen.sh

Here you can start your installation process by running the script below:

sudo ./configure

When you are done, start compiling Tesseract on Debian 11 with the following command:

sudo make

This will take some time to complete.

Next, run the following command:

sudo make install

Enter Idconfig command:

sudo Idconfig

Now you need to compile the training tools. To do this, run the following command:

sudo make training

Finally, run the following command:

sudo make training-install

For more information, you can visit the Tesseract OCR documentation.

Conclusion

At this point, you learn to Set up Tesseract OCR on Debian 11.

Hope you enjoy using it.

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Stay informed and not overwhelmed, subscribe now!