Easy Way to Install Tesseract OCR on Ubuntu 22.04

In this guide, we want to teach you How To Install Tesseract OCR on Ubuntu 22.04. Tesseract is an open-source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text layer and outputs the document into a new searchable text file, PDF, or most other popular formats.

Tesseract is highly customizable and can operate using most languages, including multilingual documents and vertical text. Although the software can be used on Windows or Linux, this guide will be based on Mac operating systems which are done through the terminal application.

Now proceed to the following steps from the Orcacore website to Install Tesseract OCR on Ubuntu 22.04.

Table of Contents

2 Methods To Install Tesseract OCR on Ubuntu 22.04

To install Tesseract OCR, you need to log in to your server as a non-root user with sudo privileges. To do this, you can follow our guide the Initial Server Setup with Ubuntu 22.04.

In this guide, you will learn to install Tesseract OCR from the APT repository and install it in the latest version from the source.

Follow the steps below to complete this guide.

Method 1 – Install Tesseract OCR on Ubuntu 22.04 with APT

By default, Tesseract OCR is available in the default Ubuntu repository.

First, you need to update your local package index with the following command:

sudo apt update -y

Then, use the command below to install Tesseract OCR on Ubuntu 22.04:

sudo apt install tesseract-ocr

Tesseract will install under /usr/share/tesseract-ocr/4.00/tessdata.

The convert command is useful for converting between image formats and resizing an image, blurring, cropping, despeckling, dithering, drawing on, flipping, joining, re-sampling, and more.

This tool is provided by Imagemagick. To install it, run the command below:

sudo apt install imagemagick

Here you can test your Tesseract.

To do this, find an image containing the text, and then, execute the following command:

tesseract <image_name> <output file_name>

Tesseract extracts text from the image. To work with Tesseract, you just need to create word-count documents. You have to train it to understand the handwriting.

Method 2 – Install Tesseract OCR from the source

Another way to install Tesseract OCR is to get it from the source. You can use it in all Linux distros.

Install the required packages on Ubuntu 22.04 with the command below:

sudo apt install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config libpango1.0-dev

Get Tesseract OCR from GitHub with the command below:

git clone https://github.com/tesseract-ocr/tesseract.git

Then, switch to your Tesseract directory on Ubuntu 22.04 with the command below:

cd tesseract

Now you need to create the Tesseract installation files on Ubuntu 22.04. To do this, run the script below:

sudo ./autogen.sh

Here you can start your installation process by running the script below:

sudo ./configure

When you are done, start compiling Tesseract on Ubuntu 22.04 with the following command:

sudo make

This will take some time to complete.

Next, run the following command to install the packages:

sudo make install

Enter Idconfig command:

sudo Idconfig

Now you need to compile the training tools. To do this, run the following command:

sudo make training

Finally, run the following command:

sudo make training-install

That’s it, you are done. For more information, you can visit the Tesseract OCR documentation.

Conclusion

At this point, you have learned to Install Tesseract OCR on Ubuntu 22.04 by using the APT repository and from the source in the latest version. I hope you enjoy it.

You may also interested in these articles:

How To Install MonoDevelop on Ubuntu 22.04

Install Symfony PHP Framework on Ubuntu 22.04

Install and Use CMake on Ubuntu 22.04