话题中的页数:   [1 2] >
How can I count words in PDF files?
论题张贴者: suesimons
suesimons
suesimons  Identity Verified
Local time: 06:59
Portuguese葡萄牙语译成English英语
Apr 12, 2006

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


 
Giles Watson
Giles Watson  Identity Verified
意大利
Local time: 07:59
Italian意大利语译成English英语
纪念
It certainly has... Apr 12, 2006

... and there are plenty of links here:

http://www.proz.com/post/326526#326526

You can find more relevant messages by typing "pdf" or "pdf count" in the "Search forums" box in the top righthand corner of this page.

HTH

Giles


 
Kristine Sprula (Lielause)
Kristine Sprula (Lielause)  Identity Verified
拉脱维亚
Local time: 08:59
正式会员 (自2005)
English英语译成Latvian拉托维亚语
+ ...
It depends.... Apr 12, 2006

suesimons wrote:

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


If the file has been created as .pdf document, one option is copying text to Word, another one is using a special programm for word count.
But if the file has been created as a picture - the text is scanned and then made as .pdf document, the only option is manual counting.

Regards,
Kristine


 
Marisa Condurso de Nohara
Marisa Condurso de Nohara  Identity Verified
阿根廷
Local time: 02:59
English英语译成Spanish西班牙语
+ ...
Word and ABBy Apr 12, 2006

Kristine Lielause wrote:

... one option is copying text to Word.....

But if the file has been created as a picture.....

Kristine


I would like to add something to Kristine's suggestion:

I usually do it by copying and pasting on Word, but be careful! Some words may become joined, and objects with readings won't be taken into account. So before clicking on "word-count" see the whole Word.doc over to separate possible word unions and treat objects separatedly.


Secondly, when it has been created as a picture, you could use AbbyFinder to transform "objects" (one pdf page copied) into "words", but truth to tell, I am not sure what happens when pdf's texts are too long. I habitually use AbbyF when dealing with individual pictures with words in-between.

Hope it helps!
McN


 
ddelvecchio
ddelvecchio
Local time: 07:59
English英语译成Italian意大利语
+ ...
Practicount Apr 22, 2006

Hello!!

If the file isn't an image, I use Practicount&Invoice, a really nice and simple software counting words from every type of document.
It also generates invoices and many other things.

You can download a shareware version here:
http://www.practiline.com/download.htm

Bye!!
Davide


 
aitteam
aitteam
乌克兰
Local time: 08:59
正式会员 (自2009)
English英语译成Ukrainian乌克兰语
Word count in pdf, images, and 30 more file formats Jul 28, 2009

Hello,

We have just released new version of our word count software. It is called AnyCount and is used by more than 5000 people worldwide. I am sure colleagues on the forum may give their opinion on its pros and cons.

I will only mention new feature of version 7 - word count in BMP, JPG, PNG, and GIF files.

Best,
Vladimir.


 
Anna Villegas
Anna Villegas
墨西哥
Local time: 23:59
English英语译成Spanish西班牙语
Try this one Jul 29, 2009

http://www.globalrendering.com/download.html

It is a good tool.



 
Samuel Murray
Samuel Murray  Identity Verified
荷兰
Local time: 07:59
正式会员 (自2006)
English英语译成Afrikaans南非语
+ ...
Here's how Jul 29, 2009

suesimons wrote:
How do I count the words in a .pdf document?


1. In your PDF viewer, press Ctrl+A and Ctrl+C, and then in MS Word, press Ctrl+V. If you can see the text, count it. If you can't see the text, go to step 2.

2. Use a good, expensive OCR program to convert the PDF into MS Word, and then use CompleteWordCount to count the text. If you don't want to use OCR, go to step 3.

3. Count the way we counted in the old days, by counting a few average lines and then multiplying the average by the average number of lines per page and the number of pages.

http://www.shaunakelly.com/word/CompleteWordCount/


 
Michael GREEN
Michael GREEN  Identity Verified
法国
Local time: 07:59
English英语译成French法语
Agree with Samuel Jul 30, 2009

... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


 
Tony M
Tony M
法国
Local time: 07:59
会员
French法语译成English英语
+ ...
SITE LOCALIZER
Only for text-based PDFs? Jul 31, 2009

Tadzio Carvallo wrote:
Try this one:
http://www.globalrendering.com/download.html


Yes, but as far as I can ascertain from that website, it still only seems to count words in PDF files created directly from native text formats; so it still can't solve the problem of what to do when the PDF is in fact an image from some scanned document etc.

Like Michael G., I have occasionally had to resort to printing out the file and then OCRing it, which really does seem a roundabout way of doing things! Also a problem with poorer quality originals, particularly with fine print; however, the actual absolute accuracy of the OCR is fairly unimportant, as long as on average it produces about the right number of words; and in my exprience, it's a case of 'swings and roundabouts', and the end result is usually accurate enough; after all, it is hardly cost-effective to waste a lot of time producing a to-the-word accurate wordcount, since any discrepancy is likely to be fairly small.

As I translate mainly from FR > EN, I sometimes agree with the customer to base my charging on target word count + a percentage; generally, 10% seems about right for FR>EN, though on a statistical analysis I once did of a quite large number of files, I noticed variations from –16% to +5% in the FR > EN wordcount difference, so the variability is quite large! But I find most customers don't argue with 10% (they can see for themselves that the EN take up less space!), and it's not really worthwhile wasting time trying to get greater accuracy.

In passing, I'd just like to mention one customer who requested specifically that I not reduce my EN translation by more than 5% compared to the FR, for DTP reasons! Better still, this particular customer pays me by target wordcount anyway! However, in the particular field I was working in, it was actually extremely difficult to comply!


 
Igor Moshkin
Igor Moshkin
俄罗斯联邦
Local time: 12:59
English英语译成Russian俄语
+ ...
FineCount Jul 31, 2009

Try FineCount - http://www.tilti.com/tilti-com.software.finecount?pc_code=F97961DA6D40A&ver=2.5.1.1766
It's free, though requires registration. In addition to word count this soft provides you plenty of other useful information including invoice.


 
CHEN-Ling
CHEN-Ling  Identity Verified
Local time: 13:59
Chinese汉语译成English英语
+ ...
OCR Aug 13, 2009

Michael GREEN wrote:
... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


This is what I want to say. Actually a single OCR software, such as Shocr7.0 is enough. Usually I first save the PDF file as tiff file, then I open the saved tiff files in Shocr 7.0 and transform them into text. Finally copy these text on a word file and count.


 
Pierre Fleutot
Pierre Fleutot
阿根廷
Local time: 02:59
English英语译成French法语
+ ...
Excellent Dec 29, 2009

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


 
Tam Nguyen
Tam Nguyen
越南
Local time: 12:59
English英语译成Vietnamese越南语
+ ...
count words in PDF Jan 22, 2010

PierreF wrote:

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


Tried it! That's OK and easy.


 
Virginia canvas
Virginia canvas
美国
Local time: 22:59
French法语译成English英语
+ ...
Second vote for PractiCount Mar 26, 2010

We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tri
... See more
We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tried the standard route of Adobe Acrobat's save-as function. Low-ball word count because most of the images remained images - not editable text.

Next I tried the OCR tool ABBYY PDF Transformer (another tool I love!!). Fair results. At least ABBYY converted most of the images to text, but it still looked incomplete for estimating purposes.

Then I resorted to PractiCount. Somehow PractiCount came up with 2000 words higher than either of the other two approaches.


Note: Over the years, I have found that the success of OCR tools varies with the nature of the image and layout. ABBYY seems to be among the best (especially for foreign or multi-language docs and for retaining layout that a translator can use). But not always. Sometimes OmniPage or another OCR tool simply has better luck for a creative design layout. It seems to be a matter of trial and error with those scans or embedded images.

Good luck,
- Virginia Anderson

Oregon Translation, LLC
Building cooperative relationships with translators.
Apply as a translator here: www.oregontranslation.com
Collapse


 
话题中的页数:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I count words in PDF files?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »