matching rate 论题张贴者: Krzysztof Kożurno
|
Hello,
Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.
Best regards,
Krzysztof | | | The algorithm is faulty | Jan 4, 2013 |
big_fish wrote:
Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.
memoQ has serious matching problems for short segments.
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.
It's a very old bug.
AFAIR I pointed it two years ago.
Cheers
GG
[Edited at 2013-01-04 10:31 GMT] | | | Good to know for jobs with short-segments and low match rates | Jan 4, 2013 |
Grzegorz Gryc wrote:
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.
It's a very old bug.
AFAIR I pointed it two years ago.
Cheers
GG
[Edited at 2013-01-04 10:31 GMT]
Thank you Grzegorz!
It's important to know, especially as some jobs may get horribly underpaid this way.
You have to keep an eye on the rates for hits between 60 and 70% for short segments.
In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.
It's important to keep this glitch in mind in these cost-saving times.
Best regards,
Krzysztof | | | LEXpert 美国 Local time: 17:04 正式会员 (自2008) Croatian克罗地亚语译成English英语 + ... Dates and numbers | Jan 4, 2013 |
I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different. | |
|
|
Text recognition algorithms again... | Jan 4, 2013 |
big_fish wrote:
Grzegorz Gryc wrote:
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.
It's a very old bug.
AFAIR I pointed it two years ago.
It's important to know, especially as some jobs may get horribly underpaid this way.
Generally, you should know one is very often underpaid according to the memoQ wordcount 
I.e. in your language pairs it's not a very big problem, the word number difference is usually neglectable but e.g. for FR-PL it may reach by default approx. 15%.
The problem is the word definition in memoQ, it corresponds to a Word-like wordcount i.e. the word is a character chain between spaces (or equivalent), most tools use some GMX-V like word definition i.e. word separators are used (apostrophes, dashes etc.).
E.g., for memoQ, 1-Chloro-2,4-dinitrobenzene is one word while most tools would show more, e.g. 4 words in Trados or 2 words in Déjà Vu (DVX doesn't count numerals as words).
Nonetheless, IMO, unlike many Trados "features", it's not a cheat intent, it's just a fundamental error in the memoQ design.
E.g., this kind of word definition makes memoQ barely usable for some types of jobs e.g. the segments containing chemical compound names like:
1-Chloro-2,4-dinitrobenzene
1-Chloro-3,4-dinitrobenzene
will not be recognized as similar by memoQ even if you lower the threshold to 10% (sic!, ten percent).
Of course, it will also screw up the match level for larger segments but it will be less visible.
You have to keep an eye on the rates for hits between 60 and 70% for short segments.
Frankly speaking, almost everything below 70% should be considered (i.e. paid) as no match...
So why Trados Studio pumps up artificially the wordcount i.e. the match rate is usually approx. 30% higher (relative value) than the old Trados match rates.
E.g. when two word differs in a 5 word sentence, the old Trados shows a 60% match, the new one claims it's a 72 or 73% match, which is obviously absurd for sentences like "The Silence of the Lambs" and "The Voice of the Martyrs"...
In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.
Yep, obviously.
E.g. in French it corresponds to "Il pleut" and "Ça arrive" etc.
It's important to keep this glitch in mind in these cost-saving times.
Most people don't care about algorithms but it's useful
Cheers
GG | | |
Rudolf Vedo CT wrote:
I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.
The same with tags.
I didn't analyze it thoroughly i.e. I'm unable to quantify it but it seems memoQ follows in some way the Trados behaviour where the numeral weight is two times bigger than the word weight.
Trados pollutes reason.
Cheers
GG | | | Apparently no improvement yet | Feb 24, 2015 |
This is an update saying that Kilgray seem to have made no progress regarding these problems right in the very heart of their software. I see a similar behavior in memoQ 2014 R2:
DE: Summe
DE TM: Indian Summer
Match: 69%
EN: Meta-analysis (2)
EN TM: -5
Match: 73%
EN: Effector T Cell
EN TM: T:
Match: 65%
On the other hand, memoQ gives only a 90% match rate for a 33-word (168-character) sentence from the TM id... See more This is an update saying that Kilgray seem to have made no progress regarding these problems right in the very heart of their software. I see a similar behavior in memoQ 2014 R2:
DE: Summe
DE TM: Indian Summer
Match: 69%
EN: Meta-analysis (2)
EN TM: -5
Match: 73%
EN: Effector T Cell
EN TM: T:
Match: 65%
On the other hand, memoQ gives only a 90% match rate for a 33-word (168-character) sentence from the TM identical to the source segment except for one different symbol/letter and four different formatting tag pairs.
You can find forum discussions on the internet dating from 2011 where someone from Kilgray admits to the severity of this problem, saying that they have an "extreme bottleneck for any TM engine related development/bugfixing" and that they will try to fix this as soon as possible.
It is a pity that this hasn't happened yet. I was so hoping that I could get rid of this behavior when recently changing from another industry-leading CAT tool to memoQ. ▲ Collapse | | | over-reliance on technology | Feb 24, 2015 |
I did not expect to revisit this discussion after years from the initial post.
Even though translation aids have brought a new quality to the manner we work, the over-reliance on technology has not brought any improvement in the quality of translation output.
The software does not make translators richer either.
Translators are cogs in a machine. Translation companies are supervisors. Customers watch their bills.
Who's happier because of this? | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » matching rate Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |