Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Jan 1, 2021

I have a list with about 40K (1) entries where ä, Ä, ö, Ö, ü, Ü and ß have been transcribed as ae, Ae, oe, Oe, ue, Ue and ss. But the list also contains (2) entries where ae, Ae, oe, Oe, ue, Ue and ss are not transcriptions of ä, Ä, ö, Ö, ü, Ü and ß.

Question: How can I correct entries of type (1) but leave entries of type (2) unmodified?

Ablesegeraet
Ablieferungspruefung (1)
Ablieferungspruefungen
abmeisseln
Abmessen
Abmessung
Abschaltfrequenz (2)
Abschaltreaktivitaet
Abschaltsteuerung
Abschaltverstaerker
abschiessen
Abschirmbehaelter (1)
Abschirmungsschlauch (2)
Abschlaege
Abschlaeger
Abschlaglaenge
Abschlagschuss
abschliessen


 
esperantisto
esperantisto  Identity Verified
Local time: 00:23
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Spellcheck Jan 1, 2021

First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested.

Hans Lenting
 
Erik Freitag
Erik Freitag  Identity Verified
Germany
Local time: 23:23
Member (2006)
Dutch to German
+ ...
Exactly Jan 1, 2021

esperantisto wrote:

First, batch replace ae with ä, oe with ö etc. Then replace most obvious wrong replacements such as ßch to ssch. Run a spellchecker and correct as suggested.


That'd be my advice, too. Type 1 errors with umlauts will be few and far between anyway. You'll have most of them covered by re-replacing "qü" with "que", "Qü" with "Que", "eü" with "eue", and "Eü" with "Eue". Then, as esperantisto suggests, do ßch->ssch. Correct what's left over with a spellchecker (preferrably a good one, the old Duden spellchecker comes to mind).

You may be left with not as many manual corrections as one would think at first glance.

Succes!


Hans Lenting
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 23:23
Member (2006)
English to Afrikaans
+ ...
@Hans Jan 2, 2021

Hans Lenting wrote:
I have a list with about 40 000 entries...
How can I correct entries of type (1) but leave entries of type (2) unmodified?


I'm afraid you're going to have to use a spell-checker, and it would have to be a spell-checker capable of checking compound nouns. Do you have such a spell-checker? I would be surprised if MS Word's spell-checker can't do this sort of thing.

Then it's a matter of removing mis-spelled words from the list, then doing conversions on those mis-spelled words, then removing the mis-spelled words from that list, and then you're left with a list of words that your spell-checker doesn't recognise with or without the conversion, which you'd have to check manually. One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list.

I use a macro in MS Word from editorium.com that makes a list of mis-spelled words, although the macro does not remove those words from the original list (so you'd have to find a way of doing that). On a large document with many mis-spellings, your display could freeze until the macro has run its entire course. You can try to increase the speed by replacing line breaks with spaces temporarily. You may also benefit from a different macro (or second macro) that highlights mis-spelled words in the original list. I googled for it and found one that works for me, here. In addition, I confirm that this macro works in Excel 365 (at least, it works in French) -- it highlights whole cells, so you'd have to ensure you have one word per cell.

Samuel

[Edited at 2021-01-02 12:00 GMT]


Hans Lenting
 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 00:23
Member (2003)
Finnish to German
+ ...
qu/Qu und ssch ersetzen Jan 2, 2021

Diese durch Sonderzeichen ersetzen und dann die generelle Ersetzung von ue -> ü, ss -> ß etc. durchführen. Danach die Sonderzeichen zurückkonvertieren.
Ich bin mit der Rechtschreibprüfung von Word zufrieden.
Bei ß muss man natürlich aufpassen, dass nach Diphthong ß steht, selten aber nach einzelnen Vokalen. Also würde ich iess nach ieß generell konvertieren etc. Oder die Liste gilt für die Schweiz. Dann kein ß.
Am Schluss musst du die Liste doch manuell prüfen.


Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Es war viel Arbeit Jan 3, 2021

Heinrich Pesch wrote:

Am Schluss musst du die Liste doch manuell prüfen.


Genau so habe ich es auch gemacht. Und dabei ein neues Wort gelernt:

https://iate.europa.eu/search/standard/result/1609653594195/1

Rebate on the rebate. I think that says it all. This German word is perhaps doomed to perish. Curiously, there’s no entry for “good riddance”.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Both forms Jan 4, 2021

Samuel Murray wrote:

One possible downside to this method (that you can work around, if you know of it) is that only one variant of a word will end up in the final list. So if for example both "ass" and "aß" are valid German words, then only one of them will end up in your list.


I used this list to fix misspellings in my downloaded copy of the IATE de_nl. Since I added the term pairs with the corrected spelling, the old ones, probably from the beginning of IATE, are still available.

On the other hand, there will be many term pairs where I incorrectly replaced an ae with ä, etc. For my purposes, that doesn't matter: the correct spelling forms are still available. I wonder whether the IATE will ever be corrected in this regard. Probably not, since that would be gigantic operation.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Another approach Jan 9, 2021

In order to reduce the number of words that I would have to check manually, I came up with this other approach:

From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words.

I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all
... See more
In order to reduce the number of words that I would have to check manually, I came up with this other approach:

From various sources I collected lists with correctly spelled German words. I placed them in one file of about 500K words. From this list I extracted all words with an ä, Ä, ö, Ö, ü, Ü or ß, resulting in a new list of about 76K words.

I changed all words in this list to lowercase and copied them to the second column of a spreadsheet. I then replaced all ä, ö, ü and ß in the 76K list to ae, oe, ue and ss and copied the result to the first column of the spreadsheet.

Finally, I used this spreadsheet to make case-adaptive replacement to the original list of 40K words with incorrect spelling.

So, using the 76K list I have entries like:

Screenshot 2021-01-09 at 09.55.52

and:

Screenshot 2021-01-09 at 09.56.06

And with this, I can correct words like:

Fuehrungsgelaende
Gelaendefuehrung



[Edited at 2021-01-09 09:07 GMT]
Collapse


Dan Lucas
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Convert ae, oe, ue, ss to ä, ö, ü, ß where applicable






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »