In this guide, we want to teach you to Install Tesseract OCR on AlmaLinux 9.
Tesseract — is an optical character recognition engine with open-source code, this is the most popular and qualitative OCR library.
OCR uses artificial intelligence for text search and its recognition of images.
Tesseract is finding templates in pixels, letters, words, and sentences. It uses a two-step approach that calls adaptive recognition. It requires one data stage for character recognition, then the second stage to fulfill any letters, it wasn’t insured in, by letters that can match the word or sentence context.
Steps To Install Tesseract OCR on AlmaLinux 9
To complete this guide, you must log in to your server as a non-root user with sudo privileges. To do this, you can follow our guide on Initial Server Setup with AlmaLinux 9.
Set up Tesseract OCR on AlmaLinux 9
At this point, we want to show you to install Tesseract on AlmaLinux 9 from the source.
First, you need to update your local package index with the following command:
sudo dnf update -y
Install required packages and Dependencies
Then, you need to install the required packages for building the Tesseract on AlmaLinux 9:
sudo dnf install git automake make autoconf libtool clang gcc-c++.x86_64 wget -y
Install the leptonica dependencies with the following command:
sudo dnf install zlib zlib-devel libjpeg libjpeg-devel libwebp libwebp-devel libtiff libtiff-devel libpng libpng-devel -y
Now you need o move the executables to your path with the following commands:
# cd /usr/local/lib # sudo cp /usr/lib64/libjpeg.so.62 . # sudo cp /usr/lib64/libwebp.so.7 . # sudo cp /usr/lib64/libtiff.so.5 . # sudo cp /usr/lib64/libpng16.so.16 .
Clone Leptonica From Git
Next, you need to clone leptonica from git with the following command:
# cd ~ # git clone https://github.com/DanBloomberg/leptonica.git --depth 1
Switch to your Leptonica directory:
Compile and Build Leptonica
At this point, you can compile the leptonica with the following commands:
# ./autogen.sh # ./configure --prefix=/usr/local --disable-shared --enable-static --with-zlib --with-jpeg --with-libwebp --with-libtiff --with-libpng --disable-dependency-tracking # sudo make # sudo make install # sudo ldconfig
Download Tesseract OCR on AlmaLinux 9
When your Leptonica installation is completed, you can download the latest version of Tesseract OCR on AlmaLinux 9 from GitHub. To do this, run the commands below:
# cd ~ # VER=$(curl -s https://api.github.com/repos/tesseract-ocr/tesseract/releases/latest|grep tag_name | cut -d '"' -f 4) # wget https://github.com/tesseract-ocr/tesseract/archive/refs/tags/$VER.tar.gz -O tesseract-5.tar.gz
Then, extract your downloaded file:
tar zxvf tesseract-5.tar.gz
Switch to your Tesseract directory on AlmaLinux 9:
Compile Tesseract OCR
Now you need to compile your Tesseract OCR with the following commands:
# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig # ./autogen.sh # ./configure --prefix=/usr/local --disable-shared --enable-static --with-extra-libraries=/usr/local/lib/ --with-extra-includes=/usr/local/lib/
Build and Install Tesseract OCR
At this point, you can build and install Tesseract on AlmaLinux 9 with the commands below:
# sudo make # sudo make install # sudo ldconfig
When your installation is completed, you can load Tesseract languages.
Load Tesseract Languages
First, you need to create a language path with the following command:
mkdir -p /tess/traineddata
Then, export the Tesseract path by adding the below line to ~/.bashrc.
Note: You can replace $USER with the exact username on the system
Now source the profile with the following command:
At this point, you can add any trained data available on Github tessdata to the path.
# cd $TESSDATA_PREFIX # wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata # wget https://github.com/tesseract-ocr/tessdata/raw/main/fra.traineddata
Now let’s see how to use Tesseract OCR.
How To Use Tesseract OCR on AmaLinux 9
When Tesseract OCR has been installed on AlmaLinux 9, you can now start extracting text from scanned documents or images.
To convert an image to a text file, you can use the syntax below:
tesseract <image_name> <output file_name>
tesseract image.png new
The output will be a text file- new to the image file- image.png.
When using Tesseract OCR you can specify the language you want to use with the -l flag. For example, use Czech.
tesseract image.png new -l ces
You can specify multiple languages as well.
tesseract image.png new -l ces+eng
At this point, you have learned to Install Tesseract OCR on AlmaLinux 9.
Hope you enjoy it. You may be interested in these articles: