Install Tesseract OCR on Debian 12 via Terminal

In this guide, you will learn to Install Tesseract OCR on Debian 12 via Terminal by using the APT repository and From Source. Tesseract OCR (Optical Character Recognition) is an open-source software tool used for extracting text from images or scanned documents. It is developed by Google and maintained by the open-source community. Also, Tesseract is one of the most widely used OCR engines available.

Now you can follow the rest of the article to start Tesseract OCR installation from the APT repository or get the latest version from the source.

what you read in this post?

Easily Learn to Install Tesseract OCR on Debian 12 via Terminal
Method 1 - Installing Tesseract OCR from Debian APT Repository
Method 2 - Get Tesseract OCR From Source on Debian 12
Basic Commands for Using Tesseract OCR
- Conclusion

Easily Learn to Install Tesseract OCR on Debian 12 via Terminal

To complete this guide, you must log in to your server as a non-root user with sudo privileges. For this purpose, you can check the Debian 12 Initial Setup Guide.

Then, follow the steps below to start the powerful Tesseract OCR installation on Debian 12.

Method 1 – Installing Tesseract OCR from Debian APT Repository

As you may know, the Tesseract OCR package is available in the Default Debian 12 repository. So you can easily run the system update and install your packages with the following command:

# sudo apt update
# sudo apt install tesseract-ocr -y

The packages will be installed under the /usr/share/tesseract-ocr directory.

At this point, you can also use a powerful tool called Imagemagick. It provides a wide range of functionalities for converting, composing, editing, and displaying images in various formats. ImageMagick is commonly used in conjunction with Tesseract OCR to preprocess images before performing optical character recognition.

To install this amazing tool, you can run the command below:

sudo apt install imagemagick -y

Method 2 – Get Tesseract OCR From Source on Debian 12

In this method, you can download and install the latest Tesseract OCR from the source. To do this, install the required packages with the command below:

sudo apt install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config libpango1.0-dev

Then, use the following command to clone the latest Tesseract OCR from GitHub:

sudo git clone https://github.com/tesseract-ocr/tesseract.git

Once your download is completed, switch to your Tesseract directory:

cd tesseract

Next, use the command below to create the Tesseract installation files on Debian 12:

sudo ./autogen.sh

Now run the following commands to start your build and installation process:

# sudo ./configure
# sudo make
# sudo make install
# sudo Idconfig

Also, you need to complete and install the training tools. In Tesseract OCR, training tools refer to a set of utilities and scripts provided by the Tesseract project for training custom language data and improving the accuracy of optical character recognition for specific languages, fonts, or styles of text. To install it, use the following commands:

# sudo make training
# sudo make training-install

Once your installation is completed, you can now start to use Tesseract OCR on Debian 12.

Basic Commands for Using Tesseract OCR

At this point, we try to provide the most common and basic commands for Tesseract OCR:

Basic Tesseract OCR Syntax:

tesseract [input_image] [output_text]

This command performs OCR on the specified input image and saves the recognized text to the specified output text file.

Specify your desired language:

tesseract [input_image] [output_text] -l [language_code]

With this command, you can replace your desired language code for OCR on Debian 12. For example, use eng for English.

PDF Output:

tesseract [input_image] [output_pdf] pdf

This command generates a searchable PDF file containing the recognized text from the input image.

Specify OCR Engine Mode:

tesseract [input_image] [output_text] --oem [mode]

You can use this command to specify the OCR engine mode including:

0: Legacy engine
1: Neural nets LSTM engine
2: Legacy + LSTM engines
3: Default, based on what is available

These are some of the basic commands for using Tesseract OCR. You can find more options and parameters in the official documentation.

Conclusion

Installing Tesseract OCR on Debian 12 via the terminal is a straightforward process that enables users to use powerful optical character recognition capabilities for text extraction from images or scanned documents. Hope enjoy it.

Also, you may like to read the following articles:

Install XWiki on Debian 12

Maximize Sudo Session Duration in Linux

Installing PHP 8.3 on Debian 11

CheckMK Setup Guide For AlmaLinux 9 / Rocky Linux 9

Duplicate a MySQL Database with a Different Name in Linux