In this guide, we want to teach you How To Install Tesseract OCR on Ubuntu 22.04.
Tesseract is an open-source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text layer and outputs the document into a new searchable text file, PDF, or most other popular formats.
Tesseract is highly customizable and can operate using most languages, including multilingual documents and vertical text. Although the software can be used on Windows or Linux, this guide will be based on Mac operating systems which is done through the terminal application.
How To Install Tesseract OCR on Ubuntu 22.04
To install Tesseract OCR, you need to log in to your server as a non-root user with sudo privileges. To do this, you can follow our guide the Initial Server Setup with Ubuntu 22.04.
In this guide, you will learn to install Tesseract OCR from the APT repository and install it in the latest version from the source.
Follow the steps below to complete this guide.
Install Tesseract OCR on Ubuntu 22.04 with APT
By default, Tesseract OCR is available in the default Ubuntu repository.
First, you need to update your local package index with the following command:
sudo apt update -y
Then, use the command below to install Tesseract OCR on Ubuntu 22.04:
sudo apt install tesseract-ocr
Tesseract will install under /usr/share/tesseract-ocr/4.00/tessdata.
The convert command is useful for converting between image formats and resizing an image, blurring, cropping, despeckling, dithering, drawing on, fliping, joining, re-sampling and more.
This tool is provided by Imagemagick. To install it run the command below:
sudo apt install imagemagick
Here you can test your Tesseract.
To do this, find an image containing the text and then execute the following command:
tesseract <image_name> <output file_name>
Tesseract extracts text from the image. To work with Tesseract, you just need to create word count documents. You have to train it to understand the handwriting.
Install Tesseract OCR from the source
Another way to install Tesseract OCR is to get it from the source. You can use it in all Linux distros.
Install the required packages on Ubuntu 22.04 with the command below:
sudo apt install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config libpango1.0-dev
Get Tesseract OCR with the command below:
git clone https://github.com/tesseract-ocr/tesseract.git
Then, switch to your Tesseract directory with the command below:
Now you need to create the Tesseract installation files on Ubuntu 22.04. To do this, run the script below:
Here you can start your installation process by running the script below:
When you are done, start compiling Tesseract on Ubuntu 22.04 with the following command:
This will take some time to complete.
Next, run the following command:
sudo make install
Enter Idconfig command:
Now you need to compile the training tools. To do this, run the following command:
sudo make training
Finally, run the following command:
sudo make training-install
For more information, you can visit the Tesseract OCR documentation.
At this point, you learn to Install Tesseract OCR on Ubuntu 22.04.
Hope you enjoy it.
You may be interested in these articles on the orcacore website: