Working on xliff files exported from Wordpress WPML (Studio 2019)
Thread poster: Johanne Dupuy
Johanne Dupuy
Johanne Dupuy  Identity Verified
France
Local time: 19:19
Member (2018)
English to French
May 3, 2021

Hi,
One of my clients is sending me some xliff files extracted from Wordpress using WPML and they are not exploitable in Studio.
Here is what I get: one big segment, a lot of text not to be translated, etc.

Does someone have some experience regarding xliff files extracted from WPML/Wordpress?

Is there something my client should do during the extraction to avoid this? Would he better send me another format?

Is there something I can do in order to
... See more
Hi,
One of my clients is sending me some xliff files extracted from Wordpress using WPML and they are not exploitable in Studio.
Here is what I get: one big segment, a lot of text not to be translated, etc.

Does someone have some experience regarding xliff files extracted from WPML/Wordpress?

Is there something my client should do during the extraction to avoid this? Would he better send me another format?

Is there something I can do in order to exploit such files on Studio? (not too time-consuming, as I get hundreds of those)

Would that work better with another plugin than WPML? (I was told about Polylang).

Thanks for your advice,

Johanne
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:19
Member (2006)
English to Afrikaans
+ ...
@Johanne May 4, 2021

Johanne Dupuy wrote:
Here is what I get: one big segment, a lot of text not to be translated, etc.


I don't have any experience in this, but based on what I have read about this in previous years and on what I was able to google about this today, I believe what you're describing does not mean that there is something wrong with the WPML XLIFF files, but that Trados is incapable of processing it without some tinkering.

This post from 2013 complains about the exact same thing:
https://wpml.org/forums/topic/xliff-all-content-in-single-trans-unit/
...but the developers' response makes sense to me. The content of an entire page is put into a single XLIFF segment, and it is up to the CAT tool to split it into further segments, based on the translator's own preference. (It just so happens that most CAT tools create separate segments in XLIFF files for separate sentences, but a good CAT tool should be able to segment further by sentence in its own UI even if the source content (and subsequent translated file) is segmented by paragraph or by page.)

In addition, based on some example files that I have seen, WPML puts the translatable content inside CDATA blocks (which is perfectly acceptable), and this means that some CAT tools are unable to recognise the tags as "tags". I believe Trados refers to such tags as "embedded content". Here is a post about using embedded content processing to deal with WPML tags (it doesn't go into great detail but you could search for similar terms):
https://community.sdl.com/product-groups/translationproductivity/f/studio/25698/translating-xliff-files-from-wpml-in-sdl-studio

I imagine posting a question about this on the Trados forum would yield some answers.
https://community.sdl.com/product-groups/translationproductivity/f/general

This page on the MemoQ help file appears to give a warning that not all existing target text from a WPML XLIFF file can be trusted to be translations of the associated source text:
https://docs.memoq.com/current/en/Places/wpml-xliff-filter.html

Is there something my client should do during the extraction to avoid this?


The responses to the post from 2013 above indicate that a feature to split by paragraph may have been developed since then, and if that is so, then your client may have such an option available to them, but my guess would be that the text would still be contained in CDATA blocks anyway (so tags won't be recognised with further tweaking in Trados), and you'd still have the problem with long paragraphs.

Is there something I can do in order to exploit such files on Studio? (not too time-consuming, as I get hundreds of those).


I believe once you figured out the correct embedded content processing settings, you can create a project template in Trados that you can use every time you have such files. Note that AFAIK embedded content processing works only at the time when SDLXLIFF files are created -- the changed settings are not applied to existing SDLXLIFF files. This means that while you try to figure out the correct settings, you'd have to create new temporary projects over and over until you find the correct combination of settings. It may be that someone on the official Trados forum can tell you exactly what settings to use from the get-go, though.

Would that work better with another plugin than WPML? (I was told about Polylang).


I doubt if that is a solution. Installing and configuring a plugin for WordPress isn't a simple affair, and I have read that setting up a translation plugin is doubly difficult, so switching to e.g. Polylang may be wasted effort. This page says that PolyLang Pro uses PO files as its translation format, but I haven't seen sample files, so I can't tell if its PO files suffer from other issues:
https://polylang.pro/doc/import-and-export-strings-translations/

[Edited at 2021-05-04 08:25 GMT]


 
Johanne Dupuy
Johanne Dupuy  Identity Verified
France
Local time: 19:19
Member (2018)
English to French
TOPIC STARTER
Thanks for this long and detailed answer! May 4, 2021

I'm indeed exchanging with SDL through their forum on these questions and trying to work out what I should do in order to improve those files for translation.
I gather there are 2 problems with the files I get :
- not translatable text which appears between
- one main big segment instead of one per sentence
See the screenshot here:

Image Studio xliff

I'm not sure that I'll find a solution for both problems.
I would be curious to know if it works better on other CAT tools.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:19
Member (2006)
English to Afrikaans
+ ...
@Johanne May 4, 2021

I got my hands on a test file and here is what I found.

Trados, WFP3 and WFP6 all open the file as you described. I tried the instructions for embedded content in Trados but could not get it to work.

MemoQ, on the other hand, successfully sub-segmented and tagged the embedded content without the need for any further tweaks. I'm not sure for how long an unregistered version of MemoQ keeps going (it used to be that it reverted to a simpler version after 30 days), but if
... See more
I got my hands on a test file and here is what I found.

Trados, WFP3 and WFP6 all open the file as you described. I tried the instructions for embedded content in Trados but could not get it to work.

MemoQ, on the other hand, successfully sub-segmented and tagged the embedded content without the need for any further tweaks. I'm not sure for how long an unregistered version of MemoQ keeps going (it used to be that it reverted to a simpler version after 30 days), but if you could roundtrip via MemoQ. It's a few extra steps but not too many.

The process of creating a project in MemoQ is a little weird, but it's not hard. Also, while Trados creates an SDLXLIFF file automatically, MemoQ does not create a file in its own format automatically, so it requires an extra step. So, in MemoQ, click the little arrow on the "New Project" button and go through the dialog to create a new project (do not choose "New Project from Template). On the second page of the dialog, click "Import" to import the WPML XLIFF file. Then, later, in the project itself, right-click the file and select Export > Export Bilingual, and select the MQXLIFF format (optionally click "Plain XLIFF" ). This creates an MQXLIFF file that you can open and translate in Trados (Trados obviously converts this to SDLXLIFF and then later back to MQXLIFF again). When the translation is finished in Trados, use the usual method of creating the target file (erm... right-click the project). Then, in MemoQ, go to the project (double-click it in the list), select the file, and then click the "Import" button at the top left of the screen, choose Import and select the translated file to import it. Then, to create the final translated XLIFF file, right-click the file and select Export > Export (Choose Path) or Export (Stored Path). This creates the final WFML XLIFF file. You can reuse a project over and over -- it's super easy to add files to an existing project in MemoQ: just drag and drop.
Collapse


 
Johanne Dupuy
Johanne Dupuy  Identity Verified
France
Local time: 19:19
Member (2018)
English to French
TOPIC STARTER
MemoQ might be the solution, then... May 4, 2021

Thanks again for testing this.
I'm more and more tempted to work with MemoQ and not only for that reason...
Could you please show me or send me an extract or a picture of what you are getting with MemoQ?


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:19
Member (2006)
English to Afrikaans
+ ...
@Johanne May 4, 2021

Johanne Dupuy wrote:
Could you please show me ... a picture of what you are getting with MemoQ?


memoq in


 
Johanne Dupuy
Johanne Dupuy  Identity Verified
France
Local time: 19:19
Member (2018)
English to French
TOPIC STARTER
Rather convincing! May 4, 2021

Thanks again!

 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:19
English to Russian
Add a new segmentation rule May 4, 2021

Try this:
1. Go to Project Settings > Language Pairs > All Language Pairs > Translation Memory and Automation > select your TM (or the first/top TM in the list if you use more than one TM) > click Settings > click Language Resorces> click a pencil icon on Segmentation rules for your source language.
2. In the Segmentation Rules window, click Add...
3. Type the rule name (Description filed)
4. Click 'Advanced View'
5. In the left box (Before brea
... See more
Try this:
1. Go to Project Settings > Language Pairs > All Language Pairs > Translation Memory and Automation > select your TM (or the first/top TM in the list if you use more than one TM) > click Settings > click Language Resorces> click a pencil icon on Segmentation rules for your source language.
2. In the Segmentation Rules window, click Add...
3. Type the rule name (Description filed)
4. Click 'Advanced View'
5. In the left box (Before break) type this:
.[\n]+
6. In the right box (After break) type this:
.
7. Press OK 5 times to complete the process
8. Remove your file from source and re-import it again using the same TM.

All these actions apply to your TM which is used for segmentation. If you use more than one TM in your project, the segmenting TM is the top one (the first in the list).
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:19
English to Russian
Add a new segmentation rule May 4, 2021



 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:19
English to Russian
Replace untranslatable plain text tags with placeholders May 4, 2021

If you want to convert the untranslatable plain text (tags) into real tags/placeholders, go to File > Project Settings > File Types > XLIFF > Embedded content > tick 'Enable embedded content processing' / 'Extract in all paragraphs' > click OK.
*You will need to re-import your file after these changes.
**However I would only do this if there is no translatable text inside the tags.... See more
If you want to convert the untranslatable plain text (tags) into real tags/placeholders, go to File > Project Settings > File Types > XLIFF > Embedded content > tick 'Enable embedded content processing' / 'Extract in all paragraphs' > click OK.
*You will need to re-import your file after these changes.
**However I would only do this if there is no translatable text inside the tags.


[Edited at 2021-05-04 20:55 GMT]
Collapse


 
Thorsten Schülke
Thorsten Schülke  Identity Verified
Estonia
Local time: 15:19
Member (2012)
English to German
+ ...
Have you found a solution? Dec 3, 2022

Hi Johanne,

I've stumbled upon this old thread while searching for a solution for the same problem that you had. Have you been able to resolve it? Have you tried Stepan's approach?

I also found your thread in the Trados forum and read all the answers there as well as in other related threads. Most of them are years old, but none seem to offer a practical solution. So, I was wondering if you eventually figured it out. In this case, I would really appreciate if you could
... See more
Hi Johanne,

I've stumbled upon this old thread while searching for a solution for the same problem that you had. Have you been able to resolve it? Have you tried Stepan's approach?

I also found your thread in the Trados forum and read all the answers there as well as in other related threads. Most of them are years old, but none seem to offer a practical solution. So, I was wondering if you eventually figured it out. In this case, I would really appreciate if you could post your solution here. Thank you!

I use WPML for one of own multiligual websites, and I also just worked on a large project for a client. Their files were WPML exports. When I saw they had the same issue with huge segments (which in fact made the existing TM useless, because Trados did not recognize fuzzy matches), I did some research. That's how I came here.

The only CAT tool I've found so far, that by default seems to work correctly with WPML exports, is memoQ (version 9.8.8). However, it processes links/URLs as tags, so one cannot change them in the memoQ editor. This has to be done in WordPress afterwards. (Maybe there is a setting to change that, but I haven't investigated further.)

I also gave it a try with Phrase (Memsource), but with its default settings, it cannot even import the WPML files. It shows an error messagea and that's it. (Again, there might be some advanced import settings which can resolve the issue. But I have no idea where to start or how to set things up.)

Anyway, for now my workaround is to use memoQ and adapt the links/URLs manually in WordPress. But if there was a pracitcal solution for Trados, I would love to hear it. Thanks!

Best wishes,

Thorsten
Collapse


 
Thorsten Schülke
Thorsten Schülke  Identity Verified
Estonia
Local time: 15:19
Member (2012)
English to German
+ ...
Forget Trados and Memsource/Phrase, just use memoQ Jan 6, 2023

An update for colleagues having the same issue:

After a few more unsuccessful attempts with Trados Studio 2021 and Phrase (formerly Memsource), I've come to the conclusion that the best way to translate WMPL "XLIFF" files is indeed to us
... See more
An update for colleagues having the same issue:

After a few more unsuccessful attempts with Trados Studio 2021 and Phrase (formerly Memsource), I've come to the conclusion that the best way to translate WMPL "XLIFF" files is indeed to use memoQ.

https://wpml.org/documentation/translating-your-contents/using-desktop-cat-tools/translating-wordpress-xliff-files-in-memoq/

I am currently working with memoQ 9.8.8, which correctly separates the exported files from WPML. The only issue is that links are shown as tags. This requires an additional editing step, if you want to adapt them also to the new language version. Here is how to do that: https://helpcenter.memoq.com/hc/en-us/articles/360010266660-Editing-an-inline-tag.

When you export the XLIFFs from memoQ, the tool adds a few characters at the end of the file name indicating the target language. In my case that's "-ger". I think these have to be removed before importing the file back into WPLM/Wordpress. At least I did this and the import into WPLM worked fine, but maybe it works with any file name. Try it out for yourself.

In Wordpress, you might need to add a space before strings with a link. It seems memoQ "forgets" them when exporting the target file, when the link tag is at the beginning of a segment. There might also be a way to fix this via the settings in memoQ, but I did not check.

[Edited at 2023-01-06 12:09 GMT]
Collapse


 
Thorsten Schülke
Thorsten Schülke  Identity Verified
Estonia
Local time: 15:19
Member (2012)
English to German
+ ...
The file name is irrelevant Jan 7, 2023

Update: It does not matter what file name you use for importing the translation (the exported XLIFF from memoQ) into WPML. You do not have to delete the "-ger" or whatever memoQ adds to your file for your target language (see my post above).

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Working on xliff files exported from Wordpress WPML (Studio 2019)







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »