IRIS PDF OCR plug-in trouble
Thread poster: Pavel Slama
Pavel Slama
Pavel Slama  Identity Verified
United Kingdom
Local time: 05:27
Member (2014)
English to Czech
+ ...
Aug 23, 2017

Good afternoon. I was quite excited about the new OCR feature, with which Trados’s OCR will support my languages, such as Czech.

However, I am probably doing something wrong: I’ve installed the plug-in and enabled it in options, however attempts to OCR a Czech document via Trados still result in illegible gibberish, as if though the software had not recognized the language of the document (and perhaps defaulted to English).

I first installed the plug-in, but when I
... See more
Good afternoon. I was quite excited about the new OCR feature, with which Trados’s OCR will support my languages, such as Czech.

However, I am probably doing something wrong: I’ve installed the plug-in and enabled it in options, however attempts to OCR a Czech document via Trados still result in illegible gibberish, as if though the software had not recognized the language of the document (and perhaps defaulted to English).

I first installed the plug-in, but when I was subsequently ticking it in Options, there was a message that I should download & install it.

Thanks for any advice.
Collapse


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 06:27
Words glued togehter? Aug 23, 2017

Pavel Slama wrote:

Good afternoon. I was quite excited about the new OCR feature


I watched the video and I was wondering: are these words like "Itis rowthe mostLiked etc." really glued together?

Screen Shot 2017-08-23 at 18.43.06

If so, I'd say it's a rather poor result of Iris' OCR.


 
Pavel Slama
Pavel Slama  Identity Verified
United Kingdom
Local time: 05:27
Member (2014)
English to Czech
+ ...
TOPIC STARTER
Czech example Aug 23, 2017

OK, so to be more specific, I’ll give a very straightforward example.

Original:
Capture

Google Docs buildt in OCR (0 mistakes in this paragraph):
Capture2

Trados with IRIS OCR:
Capture3

But I’m still hoping there may be a human factor on my part.


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 02:27
English to Portuguese
+ ...
In memoriam
Butting in... Aug 23, 2017

Though my late parents were Polish, I don't speak any of it. Nor Czech, if that matters.

However I see that the ž (CZ) was OCR'd as ż (PL).
Is there any chance your program was set up for Polish (too)?

I had such experience with an ancient OCR program (can't recall its name), where ó (PT) was OCR'd as 6, until I realized that it was still set for EN, in spite of my insistent setting for PT.


 
RWS Community
RWS Community
United Kingdom
Local time: 06:27
English
You watched the video? Aug 23, 2017

CafeTran Training wrote:

If so, I'd say it's a rather poor result of Iris' OCR.


If that was IRIS I'd agree with you


 
Pavel Slama
Pavel Slama  Identity Verified
United Kingdom
Local time: 05:27
Member (2014)
English to Czech
+ ...
TOPIC STARTER
That’s what I’m wondering, whether my setup’s right Aug 23, 2017

José Henrique Lamensdorf wrote:
... any chance your program was set up for Polish (too)?


It’s not my programme, it’s the brand new plug-in made by SDL themselves, I believe. It is completely possible the setup is not right, and that's why I’m asking for help. I used it from a project set up as CS>EN.

Otherwise, well done, José, for recognizing Polish characters where Czech ones should be.


 
RWS Community
RWS Community
United Kingdom
Local time: 06:27
English
I copied your image... Aug 23, 2017

Pavel Slama wrote:

OK, so to be more specific, I’ll give a very straightforward example.

Original:
Capture

But I’m still hoping there may be a human factor on my part.


... with a screen capture and saved as a PDF. Then opened the PDF in Studio using IRIS. Doesn't look as bad as your test to me. Are you sure you used IRIS?

https://www.dropbox.com/s/byi088wuew3wcfq/cz_iris.jpg?dl=0

Regards

Paul
Why not try the new SDL Community


[Edited at 2017-08-23 22:36 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

IRIS PDF OCR plug-in trouble







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »