Source term collector
论题张贴者: CafeTran Trainer
Jul 6, 2024

Many CAT tools provide functions to list the frequent source terms of a project. This process usually produces a lot of garbage. Is there a program that only looks at the left and right of frequent nouns and then lists groups of two or three words?

 
CafeTran Trainer
CafeTran Trainer
荷兰
主题发起人
Source fragment harvester Jul 7, 2024

I should have chosen "Source fragment harvester" as the subject.

Since there have been no replies to my post, I'd like to post an idea I've had since I posted it:

Use a regular expression to extract the candidates.

Sort in Excel and delete the noise.

Screenshot 2024-07-07 at 14.01.15

Screenshot 2024-07-07 at 14.00.59

[Bijgewerkt op 2024-07-07 12:20 GMT]


 
CafeTran Trainer
CafeTran Trainer
荷兰
主题发起人
Got this suggestion Jul 8, 2024

A kind person gave me this suggestion:

sed -E "s/( a| all| allows| are| at| in| for| of| to| with| on| by| or| of| the| and| is| at)$//"


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Source term collector







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »