java - When using Tess4j to read a pdf image, only the first heading line is returned as a string result the rest of the image i

IT技术

更新时间：2025-01-089

admin管理员组
文章数量:1122832

I am using Java - Tess4j-5.13.0.jar to read a pdf containing a table like image. Its the first time using Tess4j/tesseract.

Tess4j is located here :

The pdf I am trying to convert :

The problem is when the pdf image is processed it only returns the first heading line and the rest is ignored.

The pdf contains one image that looks like a table with a heading. The heading is returned but the rest of the table is ignored. One extra string is also returned but I do not know where that comes from. "-ma_———"

This is my code that I used.

public static void main(String[] args) throws IOException, TesseractException {
    // TODO Auto-generated method stub
    File imageFile = new File("C:/Users/DFDS_Y1_2025.pdf");
    ITesseract instance = new Tesseract(); // JNA Interface Mapping
    instance.setDatapath("C:/Users/Tess4J/tessdata");
    instance.setLanguage("eng");
  
    //List<RenderedFormat> renderFormats = new ArrayList<RenderedFormat>();
    //renderFormats.add(RenderedFormat.PDF);
    //instance.createDocumentsWithResults(imageFile,null,"C:/Users/DFDS_Y1_2025_out2", renderFormats, TessPageIteratorLevel.RIL_BLOCK);

    try {
  
        String result = instance.doOCR(imageFile);
        System.out.println(result);
    } catch (TesseractException e) {
        System.out.println("ERROR");
        System.err.println(e.getMessage());
    }   }}

The result that gets printed to the console is:

Destination Rate O-1OT Rate 10.01-17T Full rate

-ma_———

So its the heading plus for some reason this string as well -ma_———

I was expecting all the other rows of data to be returned.

I have tried first extracting the image from the pdf and made it gray scale and then instead of processing the pdf I used the image file as input but I got the same result. I went thought the online examples the code is similar to mine, I cant see what I have to do to get the rest of the data.

I am using eclipse an this is the console output when I run the code :

I know this can be done using tesseract as I tested it here : .html using the scribe UI based on tesseract. /

When the pdf is uploaded to scribe it gets all the text data in the image.

I am not sure what I am doing wrong, the pdf is clear and should work. Should the image or pdf be preprocessed or what am I doing wrong.

Please let me know if you need more info.

Any help would be appreciated.

本文标签：

版权声明：本文标题：java - When using Tess4j to read a pdf image, only the first heading line is returned as a string result the rest of the image i 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736282379a1926626.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

轻松打造黑苹果：四叶草CLOVER引导U盘制作指南

编程

1天前

轻松打造黑苹果：四叶草CLOVER引导U盘制作指南【下载地址】折腾黑苹果-制作四叶草CLOVER引导U盘分享本资源文件提供了制作四叶草CLOVER引导U盘的详细步骤和所需工具。通过本指南，您可

雨林木风系统深度解析：优化体验与版本推荐的全面指南

编程

1天前

雨林木风系统深度解析：优化体验与版本推荐的全面指南在操作系统领域，雨林木风作为一款专注于提供轻量级、优化体验的系统品牌，一直备受用户关注。然而，关于雨林木风系统的好用性，却是一个颇具争议的话题。毕竟，每个用户都有自己的需求、使用习惯和偏

PC系统安装&引导：5、安装windows系统

编程

1天前

目录 🍅点击这里查看所有博文闲来无事，记录下自己以往多年总结出的一套系统维护的方法。以供有需要的人学习使用。例如，系统崩溃了无法启动怎么办，如何重

PyCharm安装激活教程(Jetbrains其它软件可参考)

编程

1天前

PyCharm安装激活教程 PyCharm安装激活教程1.python基础环境安装配置1.1 下载及安装 2.PyCharm安装及激活教程2.1 学生教师安装（有学信网edu邮箱）及激活2.1

Windows 11最稳定版本详解

编程

1天前

Windows 11最稳定版本详解 Windows 11作为微软推出的新一代操作系统，自发布以来便受到了广泛关注。其快速迭代更新的特点，使得每个月都有新版本问世，这无疑为用户带来了更多选择，但同时也带来了选择上的困惑。为了帮助大家更好地确

python 3.x - AWS Lambda code to connect with EKS cluster - Stack Overflow

IT技术

1天前

I have a lambda code in python (v3.13) which is trying to connect to an AWS EKS cluster to run a job. T

android - How to force Jetpack compose LazyHorizontalGrid to fill row by row - Stack Overflow

IT技术

23小时前

I have a HorizontalGridLayout with 2 rows. I receive a variable number of items to fill it. When I rece

Implement while loop inspring webflux to scroll Elasticsearch index and insert to redis - Stack Overflow

IT技术

22小时前

Have developed spring webflux application using java 17 and springboot 3.2,Have implemented an api to

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

编程

22小时前

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！ 【下载地址】物理网卡MAC修改器v3.0-真实网卡硬件MAC地址修改重装系统不变本仓库提供了一个强大的工具——物理网

华硕笔记本电脑用U盘重装windows系统

编程

21小时前

1.进入wondows官网下载工具并打开： 2. 3. 4. 然后拔掉你的U盘选择插入你要安装的笔记本电脑上 5. 插入U 盘后开机然后连续不断的点Esc 键进入开机设备选择界面在重启的过程中&#x

javascript - Odoo CORS Access Issue - Stack Overflow

IT技术

20小时前

I'm having trouble executing a AJAX call to a controller that belongs to Odoo.sh, I'm testing

raspberry pi - FFmpeg h264_v4l2m2m encoder changing aspect ratio from 16:9 to 1:1 with black bars - Stack Overflow

IT技术

20小时前

When switching from libx264 to h264_v4l2m2m encoder in FFmpeg for YouTube streaming, the output video&#

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

IT技术

19小时前

In Visual Studio, when I enable CodeRush and build, I see the error message with stack traceStreamJson

linux - Do all fragments of an IP packet greater than MTU carry the full PPPoE header when modified in an eBPF tc program? - Sta

IT技术

16小时前

I hope you are doing well. I am working with eBPF and tc on the egress side to add a PPPoE header to fo

ios - Sending "Start" Live Activity Notification from Apple Push Notifications Console successfully received b

IT技术

16小时前

Resorting to asking here since it seems that there's not a lot of documentation around debugging &

If I use a Google Site along with an Apps Script webapp(set to 'Anyone' access)linked to a Google Sheet, is the

IT技术

15小时前

I am trying to save user emails with subscribe button on a webapp made through Google Apps Script with

python - Diffusers pipeline Instant ID with Ipadapter - Stack Overflow

IT技术

14小时前

I want to use an implementation of InstantID with Ipadapter using Diffusers library.So far I got :imp

New Python Instance in VS Code and the terminal is passing indentions that do not exist in the code editor window - Stack Overfl

IT技术

1小时前

I have a very weird issue affecting my code.I'm getting set up on a new machine, and in VS Code

apache kafka - Unknown feature gate KafkaNodePools found in the configuration - Stack Overflow

IT技术

1小时前

I was trying to configure a KafkaNodePools functionality (and that's were probably working) using

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

IT技术

1小时前

I have a multithreaded program in C++.Here's a brief pseudo-code of the important bits and pieces

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

java - When using Tess4j to read a pdf image, only the first heading line is returned as a string result the rest of the image i

更多相关文章

轻松打造黑苹果：四叶草CLOVER引导U盘制作指南

雨林木风系统深度解析：优化体验与版本推荐的全面指南

PC系统安装&amp;引导：5、安装windows系统

PyCharm安装激活教程(Jetbrains其它软件可参考)

Windows 11最稳定版本详解

python 3.x - AWS Lambda code to connect with EKS cluster - Stack Overflow

android - How to force Jetpack compose LazyHorizontalGrid to fill row by row - Stack Overflow

Implement while loop inspring webflux to scroll Elasticsearch index and insert to redis - Stack Overflow

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

华硕笔记本电脑用U盘重装windows系统

javascript - Odoo CORS Access Issue - Stack Overflow

raspberry pi - FFmpeg h264_v4l2m2m encoder changing aspect ratio from 16:9 to 1:1 with black bars - Stack Overflow

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

linux - Do all fragments of an IP packet greater than MTU carry the full PPPoE header when modified in an eBPF tc program? - Sta

ios - Sending &quot;Start&quot; Live Activity Notification from Apple Push Notifications Console successfully received b

If I use a Google Site along with an Apps Script webapp(set to &#39;Anyone&#39; access)linked to a Google Sheet, is the

python - Diffusers pipeline Instant ID with Ipadapter - Stack Overflow

New Python Instance in VS Code and the terminal is passing indentions that do not exist in the code editor window - Stack Overfl

apache kafka - Unknown feature gate KafkaNodePools found in the configuration - Stack Overflow

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

发表评论

推荐文章

css - Enqueue All Stylesheets Found In a Theme Folder

Can&#39;t retrieve custom post type taxonomy term to custom post type editor

wpdb query with dynamic column name?

url rewriting - I want to change my custom post url dynamically in WordPress - Stack Overflow

wp query - functions with get_post_meta

热门文章

python plotly add favicon to Figure - Stack Overflow

sorting - restriction to items according to condition in dndkit - Stack Overflow

ranking - Reverse the Number in Delta Change Google Data Studio - Stack Overflow

Password reset message - change the network_home_url( &#39;&#39; )

wp query - Comment count same for every post in homepage WP_Query

filters - How to add custom field to top of Wordpress Comment Form for both logged in and anon users

sql - Database table prefix different between wp-config.php and in database

Custom quicktags not working after Wordpress 6.0

ms access - AddNew function - If linked to SQL back end i get Run time error &quot;3219&quot; invalid operation - Stack

php - Contact Form 7 Wordpress, checking a few fields, if empty then invalid

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

apache kafka - Unknown feature gate KafkaNodePools found in the configuration - Stack Overflow

New Python Instance in VS Code and the terminal is passing indentions that do not exist in the code editor window - Stack Overfl

ros2 - how to modify imu_filter_madgwick to transform RPY from imu_sensor frame to base_link frame? - Stack Overflow

Color a portion of a minipage in Manim - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

PC系统安装&引导：5、安装windows系统

ios - Sending "Start" Live Activity Notification from Apple Push Notifications Console successfully received b

If I use a Google Site along with an Apps Script webapp(set to 'Anyone' access)linked to a Google Sheet, is the

Can't retrieve custom post type taxonomy term to custom post type editor

Password reset message - change the network_home_url( '' )

ms access - AddNew function - If linked to SQL back end i get Run time error "3219" invalid operation - Stack