Splitting Large TTX Files

Large TTX files are not rare, but TagEditor handles them well (quickly). My usual task is merging numerous small files into a single master file using the “SDL Trados Glue” tool to make my work easier. Recently I faced a problem which involved splitting a single large TTX file among two translators. Producing two “chunks” for translators seemed to be a challenge, but there is a way to split the files manually.

There is a tool called Splitting TTX, which is a part of Expert Tools by Kaleidoscope GmbH. However, this tool is not free and if you do not split the files often and are willing to make some effort there is a way.

Background information: TTX format is rather simple – it is in fact a XML file. Simply said, there is an opening sequence of strings, a body (starting with the very first “<Tu” tag and ending with the very last “</Tu>” tag), and a final sequence of strings. We are interested just in the “body”.

In this example, we will create two partial TTX files. Procedure for creating more parts is similar.

1. Create a copy of the original TTX (let’s call it “Source.ttx). Name it as needed, but in an easy-to-understand way. I will call this file Source_1.ttx.

2. Using Unicode text editor (I do not recommend to use a Windows Notepad), such as PSPad (free), open the Source_1.ttx (original).

3. Search for the first “<Tu” tag.

4. Select all text between with the first tag “<Tu” and the last “</Tu>” tag (including these tags).

Example of selection:

<Tu Origin=”manual” MatchPercent=”0″><Tuv Lang=”EN-US”>Test</Tuv><Tuv Lang=”CS”><df Font=”Arial”>Zkouška</df></Tuv></Tu>

<Tu Origin=”manual” MatchPercent=”0″><Tuv Lang=”EN-US”>Test1</Tuv><Tuv Lang=”CS”><df Font=”Arial”>Zkouška1</df></Tuv></Tu>

5. Delete this text.

6. Save this file and create another identical copy (and save it as “Source_2.ttx”). Now we have a “skeleton” of the partial TTX files.

7. Open the original master file, search for the first “<Tu” tag and select roughly half of the whole file. Selection must end with the “</Tu>” tag. Copy this selection, open the “Source_1.ttx”, and paste the text from clipboard into the file exactly on place as it was in the master file. Save and close “Source_1.ttx”.

8. Copy the remaining part of the text from the source file, i.e. start selection from the next “<Tu” tag and end the selection with the last “</Tu>” tag in the file. Open “Source_2.ttx” and paste the text from clipboard into the file exactly on place, as it was in the master file. Save and close “Source_2.ttx”.

9. Now we have two TTX files split as we wanted.

Note: When analyzing these two files, the results will differ from the analysis obtained when analyzing the single TTX. The reason is that repetitions and fuzzy matches are not reflected properly when analyzing the split files separately.

Merging the files can be performed using reverse procedure (simply said, merging the two files and deleting the “footer” section of the first chunk and the “header” section of the second chunk.

If you do not feel comfortable with this approach, you may use the resulting TMs obtained from translations of partial TTX, merge these two TMs, and finally you may use Translate to Fuzzy on the original Source.ttx.

This entry was posted in Computer-aided Translation, Software-related. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *