matching rate (MemoQ support)

技术论坛 » MemoQ support »
matching rate
Track this topic

matching rate

论题张贴者: Krzysztof Kożurno

Krzysztof Kożurno

波兰
English英语译成Polish波兰语
+ ...

Dec 14, 2012

Hello,
Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.
Best regards,
Krzysztof

Grzegorz Gryc

Local time: 00:04
French法语译成Polish波兰语
+ ...

The algorithm is faulty

Jan 4, 2013

big_fish wrote:

Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.

memoQ has serious matching problems for short segments.
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]

Krzysztof Kożurno

波兰
English英语译成Polish波兰语
+ ...

主题发起人

Good to know for jobs with short-segments and low match rates

Jan 4, 2013

Grzegorz Gryc wrote:

E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]

Thank you Grzegorz!
It's important to know, especially as some jobs may get horribly underpaid this way.
You have to keep an eye on the rates for hits between 60 and 70% for short segments.
In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.
It's important to keep this glitch in mind in these cost-saving times.
Best regards,

Krzysztof

LEXpert

美国
Local time: 17:04
正式会员 (自2008)
Croatian克罗地亚语译成English英语
+ ...

Dates and numbers

Jan 4, 2013

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.

Grzegorz Gryc

Local time: 00:04
French法语译成Polish波兰语
+ ...

Text recognition algorithms again...

Jan 4, 2013

big_fish wrote:

It's important to know, especially as some jobs may get horribly underpaid this way.

Generally, you should know one is very often underpaid according to the memoQ wordcount

I.e. in your language pairs it's not a very big problem, the word number difference is usually neglectable but e.g. for FR-PL it may reach by default approx. 15%.
The problem is the word definition in memoQ, it corresponds to a Word-like wordcount i.e. the word is a character chain between spaces (or equivalent), most tools use some GMX-V like word definition i.e. word separators are used (apostrophes, dashes etc.).
E.g., for memoQ, 1-Chloro-2,4-dinitrobenzene is one word while most tools would show more, e.g. 4 words in Trados or 2 words in Déjà Vu (DVX doesn't count numerals as words).

Nonetheless, IMO, unlike many Trados "features", it's not a cheat intent, it's just a fundamental error in the memoQ design.

E.g., this kind of word definition makes memoQ barely usable for some types of jobs e.g. the segments containing chemical compound names like:
1-Chloro-2,4-dinitrobenzene
1-Chloro-3,4-dinitrobenzene
will not be recognized as similar by memoQ even if you lower the threshold to 10% (sic!, ten percent).
Of course, it will also screw up the match level for larger segments but it will be less visible.

You have to keep an eye on the rates for hits between 60 and 70% for short segments.

Frankly speaking, almost everything below 70% should be considered (i.e. paid) as no match...
So why Trados Studio pumps up artificially the wordcount i.e. the match rate is usually approx. 30% higher (relative value) than the old Trados match rates.
E.g. when two word differs in a 5 word sentence, the old Trados shows a 60% match, the new one claims it's a 72 or 73% match, which is obviously absurd for sentences like "The Silence of the Lambs" and "The Voice of the Martyrs"...

In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.

Yep, obviously.
E.g. in French it corresponds to "Il pleut" and "Ça arrive" etc.

It's important to keep this glitch in mind in these cost-saving times.

Most people don't care about algorithms but it's useful

Cheers
GG

Grzegorz Gryc

Local time: 00:04
French法语译成Polish波兰语
+ ...

And tags...

Jan 4, 2013

Rudolf Vedo CT wrote:

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.

The same with tags.
I didn't analyze it thoroughly i.e. I'm unable to quantify it but it seems memoQ follows in some way the Trados behaviour where the numeral weight is two times bigger than the word weight.
Trados pollutes reason.

Cheers
GG

Dr. Matthias Schauen

德国
Local time: 00:04
正式会员 (自2007)
English英语译成German德语

Apparently no improvement yet

Feb 24, 2015

This is an update saying that Kilgray seem to have made no progress regarding these problems right in the very heart of their software. I see a similar behavior in memoQ 2014 R2:

DE: Summe
DE TM: Indian Summer
Match: 69%

EN: Meta-analysis (2)
EN TM: -5
Match: 73%

EN: Effector T Cell
EN TM: T:
Match: 65%

On the other hand, memoQ gives only a 90% match rate for a 33-word (168-character) sentence from the TM identical to the source segment except for one different symbol/letter and four different formatting tag pairs.

You can find forum discussions on the internet dating from 2011 where someone from Kilgray admits to the severity of this problem, saying that they have an "extreme bottleneck for any TM engine related development/bugfixing" and that they will try to fix this as soon as possible.
It is a pity that this hasn't happened yet. I was so hoping that I could get rid of this behavior when recently changing from another industry-leading CAT tool to memoQ. ▲ Collapse

Krzysztof Kożurno

波兰
English英语译成Polish波兰语
+ ...

主题发起人

over-reliance on technology

Feb 24, 2015

I did not expect to revisit this discussion after years from the initial post.
Even though translation aids have brought a new quality to the manner we work, the over-reliance on technology has not brought any improvement in the quality of translation output.
The software does not make translators richer either.
Translators are cogs in a machine. Translation companies are supervisors. Customers watch their bills.
Who's happier because of this?

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

本论坛的版主
Maya Gorgoshidze	[Call to this topic]
Peter Zauner	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

matching rate

Forum rules

Help and orientation

Pastey
Your smart companion app Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations. Find out more »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

最近的帖子 | 常见问答 | 规则 | 版主 | 文章知识库

Your current localization setting

Chinese汉语

Select a language

More languages...

matching rate

matching rate

You have native languages that can be verified

Your current localization setting

Select a language