当前位置: 首页 > news >正文

Tesseract-OCR在Ubuntu20.04平台上使用

安装Tesseract-OCR

在ubuntu20.04上,我们按官方的最简单方式安装,

sudo apt install tesseract-ocr

如果你需要做开发,或是自己训练模型,那就要安装开发者工具,

sudo apt install libtesseract-dev 

安装完了检查一下,发现版本是4.1.1

tesseract -v
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

如果你希望安装最新的版本如5.1.0,那么你需要到官方地址去下载了自己编译安装,这里略过先

GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)https://github.com/tesseract-ocr/tesseract

安装qtcreator

因为这里我们会调用一个简单的qt界面,因此需要安装开发工具qtcreator。

sudo apt install qt5-default

然后到这里下载,

Download Offline Installers | Source Package Offline Installer | Qt

比如我下载的是6.0.2版本的,

https://download.qt.io/official_releases/qtcreator/6.0/6.0.2/qt-creator-opensource-linux-x86_64-6.0.2.run

sudo chmod +x qt-creator-opensource-linux-x86_64-6.0.2.run 
./qt-creator-opensource-linux-x86_64-6.0.2.run 

这样,就安装好了。

qtcreator->新建文件或项目-->Non Qt Project --> Plain C++ Application,然后在main下输入下面的源码,

源码如下,

#include <stdio.h>
#include <stdlib.h>
#include <opencv2/opencv.hpp>
#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>
#include <tesseract/publictypes.h>
#include <opencv2/imgproc.hpp>

int main()
{
    //std::cout << "Hello World!" << std::endl;
    std::string image_name = "/home/mc/ocr/testimg/testocr.png"; //"/home/mc/ocr/testimg/eurotext.png";
    cv::Mat imageMat;
    imageMat = cv::imread(image_name);
    if (imageMat.data == nullptr)
    {
        printf("No image data \n");
        return -1;
    }
    //cv::Mat blurMat;
    //cv::medianBlur(imageMat, blurMat, 5); // 图像模糊
    cv::Mat z1, g_grayImage;
    cv::cvtColor(imageMat, z1, cv::COLOR_BGR2GRAY);            // 灰度图
    // cv::threshold(z1, z2, 214, 255, cv::THRESH_BINARY);     // 阈值
    cv::adaptiveThreshold(z1, g_grayImage, 255, cv::ADAPTIVE_THRESH_MEAN_C, cv::THRESH_BINARY, 7, 25);  // 自动降噪


    cv::namedWindow("Image1", cv::WINDOW_AUTOSIZE);
    cv::imshow("Image1", g_grayImage);

    cv::waitKey(0);

    //std::system("chcp 65001");
    char* outText;
    tesseract::TessBaseAPI api;
    //if (api.Init(NULL, "chi_sim"))  // for chinese
    if(api.Init("/home/mc/ocr/tesseract/tessdata_best-main", "eng", tesseract::OEM_DEFAULT))
    {
        std::cout << stderr << std::endl;
        exit(1);
    }
    // Pix *image = pixRead("3.jpg");
    api.SetImage((uchar*)g_grayImage.data, g_grayImage.cols, g_grayImage.rows, 1, g_grayImage.cols);

    outText = api.GetUTF8Text();
    if (outText == nullptr)
    {
        std::cout << "No Data" << std::endl;
    }
    std::cout << outText << std::endl;
    // Destroy used object and release memory
    api.End();           // delete api;
    delete[] outText;    // pixDestroy(&image);

    return 0;
}

qtcreator的配置文件如下所示(注意我没用到默认的gcc-9,而是用的gcc-8),可以参考前一篇文章,

qtcreator报错:fatal error: stdlib.h: No such file or directory_高精度计算机视觉的博客-CSDN博客

TEMPLATE = app
CONFIG += console c++11
CONFIG -= app_bundle
CONFIG -= qt

SOURCES += \
        main.cpp

unix:!macx: LIBS += -L$$PWD/../../../../usr/local/lib/ -lopencv_world

INCLUDEPATH += $$PWD/../../../../usr/local/include/opencv4
DEPENDPATH += $$PWD/../../../../usr/local/include/opencv4

unix:!macx: LIBS += -L$$PWD/../../../../usr/lib/x86_64-linux-gnu/ -ltesseract

#INCLUDEPATH += $$PWD/../../../../usr/include
#DEPENDPATH += $$PWD/../../../../usr/include

INCLUDEPATH += /usr/include/c++/8

参考:     官方的安装说明

Ubuntu

If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):

sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev

if you plan to install the training tools, you also need the following libraries:

sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev

Leptonica

You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.

Tesseract versions and the minimum version of Leptonica required:

TesseractLeptonicaUbuntu
4.001.74.2Ubuntu 18.04
3.051.74.0Must build from source
3.041.71Ubuntu 16.04
3.031.70Ubuntu 14.04
3.021.69Ubuntu 12.04
3.011.67

One option is to install the distro’s Leptonica package:

sudo apt-get install libleptonica-dev

but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.

The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.

Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.

Installing Tesseract from Git

Please follow instructions in Compiling–GitInstallation

Also read Install Instructions

Install elsewhere / without root

Tesseract can be configured to install anywhere, which makes it possible to install it without root access.

To install it in $HOME/local:

./autogen.sh
./configure --prefix=$HOME/local/
make
make install

To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:

./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
  --prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make
make install

In some system, you might also need to specify the path to the pkg-config before running the configure script:

export PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig

相关文章:

  • ROS2 Error: Could not find a package configuration file provided by “turtlebot3_msgs“
  • ROS2 + colcon build 常见的一些报错
  • ROS2进阶:安装与初体验(附choco介绍)
  • ROS2进阶:colcon的初步使用--‘colcon‘ is not recognized
  • ROS2进阶:基本指令与RVIZ2介绍
  • Windows控制台cmd默认代码页修改的办法【GBK、UTF-8】
  • ROS2 ERROR: qt.qpa.plugin: Could not find the Qt platform plugin “windows“ in
  • ROS2 Warning: RosPluginProvider._parse_plugin_xml() plugin file rqt_gui_cpp/plugin.xml not found
  • ROS2进阶:turtlesim与rqt
  • TCP/UDP常见的端口号
  • ROS2进阶:基于cmake创建自己的开发包
  • ROS2进阶:如何查找特定的包(package)并列出包中所有节点(node)
  • ROS2进阶:VS2019调试ROS2-examples程序
  • Python Error: 系统找不到指定的文件。: ‘c:\\python38\\Scripts\\pep8.exe‘ -> ‘c:\\python38\\Scripts\\pep8.exe.del
  • Ubuntu中如何处理难缠的软件包升级
  • [ JavaScript ] 数据结构与算法 —— 链表
  • es6--symbol
  • Git 使用集
  • java概述
  • js ES6 求数组的交集,并集,还有差集
  • JS题目及答案整理
  • PHP变量
  • Travix是如何部署应用程序到Kubernetes上的
  • 计算机常识 - 收藏集 - 掘金
  • 理解IaaS, PaaS, SaaS等云模型 (Cloud Models)
  • 配置 PM2 实现代码自动发布
  • 如何合理的规划jvm性能调优
  • 手机app有了短信验证码还有没必要有图片验证码?
  • Java数据解析之JSON
  • puppet连载22:define用法
  • #etcd#安装时出错
  • (4) PIVOT 和 UPIVOT 的使用
  • (5)STL算法之复制
  • (cos^2 X)的定积分,求积分 ∫sin^2(x) dx
  • (vue)el-checkbox 实现展示区分 label 和 value(展示值与选中获取值需不同)
  • (安全基本功)磁盘MBR,分区表,活动分区,引导扇区。。。详解与区别
  • (补)B+树一些思想
  • (附源码)springboot高校宿舍交电费系统 毕业设计031552
  • (附源码)流浪动物保护平台的设计与实现 毕业设计 161154
  • (九十四)函数和二维数组
  • (五)Python 垃圾回收机制
  • ... 是什么 ?... 有什么用处?
  • .cn根服务器被攻击之后
  • .NET Core中Emit的使用
  • .NET 中什么样的类是可使用 await 异步等待的?
  • .NET/C# 中设置当发生某个特定异常时进入断点(不借助 Visual Studio 的纯代码实现)
  • .net使用excel的cells对象没有value方法——学习.net的Excel工作表问题
  • /etc/X11/xorg.conf 文件被误改后进不了图形化界面
  • @SuppressLint(NewApi)和@TargetApi()的区别
  • []FET-430SIM508 研究日志 11.3.31
  • [AIGC] 如何建立和优化你的工作流?
  • [BUAA软工]第一次博客作业---阅读《构建之法》
  • [BUG]Datax写入数据到psql报不能序列化特殊字符
  • [bzoj1006]: [HNOI2008]神奇的国度(最大势算法)
  • [C#基础]说说lock到底锁谁?