Problems with OCR and small text
论题张贴者: James Greenfield
James Greenfield
James Greenfield  Identity Verified
英国
Local time: 04:08
正式会员 (自2013)
French法语译成English英语
+ ...
Nov 29, 2015

Hi,

I am currently translating a dead PDF. I managed to OCR the document and the results were fine apart from the bibliography section at the end which is in very small print. The results for this section make no sense. Using abbyy finereader I tried to increase the resolution but the results were equally as bad. Does anyone have any advice? This is the first time I have had this problem. Perhaps someone with Abby finereader could guide me as to how to properly increase the resoluti
... See more
Hi,

I am currently translating a dead PDF. I managed to OCR the document and the results were fine apart from the bibliography section at the end which is in very small print. The results for this section make no sense. Using abbyy finereader I tried to increase the resolution but the results were equally as bad. Does anyone have any advice? This is the first time I have had this problem. Perhaps someone with Abby finereader could guide me as to how to properly increase the resolution. When I try to do this the image size automatically becomes smaller and it still is unable to recognise the text. Many thanks for any advice.
Collapse


 
Sergei Leshchinsky
Sergei Leshchinsky  Identity Verified
乌克兰
Local time: 06:08
正式会员 (自2008)
English英语译成Russian俄语
+ ...
can you Nov 29, 2015

send me the file?

also, if it is raster, then all you have is all you have.


 
James Greenfield
James Greenfield  Identity Verified
英国
Local time: 04:08
正式会员 (自2013)
French法语译成English英语
+ ...
主题发起人
email Nov 29, 2015

Sergei Leshchinsky wrote:

send me the file?

also, if it is raster, then all you have is all you have.


Thanks, I've just sent you an email.


 
James Greenfield
James Greenfield  Identity Verified
英国
Local time: 04:08
正式会员 (自2013)
French法语译成English英语
+ ...
主题发起人
could anyone help? Nov 29, 2015

I don't suppose anyone has really powerful OCR software that would be prepared to do me a massive favour. I can't manage to OCR the bibliograohy which is in small text and to hand type the 64 entries it is going to take me a long time. Thanks very much.

 
Melissa McMahon
Melissa McMahon  Identity Verified
澳大利亚
Local time: 13:08
French法语译成English英语
Not sure if post-facto solutions will help Nov 29, 2015

Hi James,

I'm not an expert, but I think if the scan of the original document was not a high enough resolution, then attempts to increase the resolution of the scan won't help, because the "raw material" is inadequate. If I take a blurry photo of something, no amount of fiddling with the sharpness or resolution of the photo will give me a clear photo. I think the only alternative to typing out the text is to get a better scan.

Good luck!
Melissa


 
James Greenfield
James Greenfield  Identity Verified
英国
Local time: 04:08
正式会员 (自2013)
French法语译成English英语
+ ...
主题发起人
Thanks Nov 29, 2015

Hi Melissa,

Yes, I think that's right. This section is in English anyway so I have decided not to include it. I thought about including it as it is the bibliography and the French text refers to these English journals, but as you say there is no way of increasing the resolution and hand typing it out would take me an awful long time,

James


 
Anton Konashenok
Anton Konashenok  Identity Verified
捷克共和国
Local time: 05:08
French法语译成English英语
+ ...
Do you really need to type it? Nov 30, 2015

If the list of references is already in the target language anyway, it makes sense to ask the client if they'd accept it as a pasted image instead of text. If so, you can just copy it using the Snapshot tool of Adobe Reader, then paste it into your target document.

 
esperantisto
esperantisto  Identity Verified
Local time: 06:08
正式会员 (自2006)
English英语译成Russian俄语
+ ...
SITE LOCALIZER
Convert to black and white Nov 30, 2015

In my experience, increasing the resolution above 300 dpi has no noticeable effect on recognition results even for small print. However, there is one setting (off by default) that can be usable: Tools → Options → General → More options… → Convert color/gray-scale images to black and white (translating this menu items from Russian UI for FR 8.0, thus, they may be different in your case). Try it with on.

Also, if the sections in question are French only, do select French onl
... See more
In my experience, increasing the resolution above 300 dpi has no noticeable effect on recognition results even for small print. However, there is one setting (off by default) that can be usable: Tools → Options → General → More options… → Convert color/gray-scale images to black and white (translating this menu items from Russian UI for FR 8.0, thus, they may be different in your case). Try it with on.

Also, if the sections in question are French only, do select French only for the language and (re)recognize.
Collapse


 
Tom in London
Tom in London
英国
Local time: 04:08
正式会员 (自2008)
Italian意大利语译成English英语
No problem Nov 30, 2015

James Greenfield wrote:

Hi,

I am currently translating a dead PDF. I managed to OCR the document and the results were fine apart from the bibliography section at the end which is in very small print. The results for this section make no sense. Using abbyy finereader I tried to increase the resolution but the results were equally as bad. Does anyone have any advice? This is the first time I have had this problem. Perhaps someone with Abby finereader could guide me as to how to properly increase the resolution. When I try to do this the image size automatically becomes smaller and it still is unable to recognise the text. Many thanks for any advice.


I don't know about you, James, but my Abbby Fine Reader for MacOS outputs to plain text. The resulting file can then be opened in Word and saved as a .doc file. Then you can alter the text any way you want to. I do this all the time.

[Edited at 2015-11-30 07:51 GMT]


 
Rolf Keller
Rolf Keller
德国
Local time: 05:08
English英语译成German德语
Enlarge the picture externally Dec 1, 2015

esperantisto wrote:

In my experience, increasing the resolution above 300 dpi has no noticeable effect on recognition results even for small print.


Ack.

Convert color/gray-scale images to black and white


Ack.

Plus plan C:
Enlarge the picture beforehand.

If needs be, go to a copy shop, make an enlarged copy, try different contrast settings etc, then scan/export the result onto an USB stick. The shop staff will help you with this.

Back in your office, OCR the file on the stick.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Problems with OCR and small text






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »