Samuel Murray, Translator - English into Afrikaans | Article index

OMEGAT: INTRODUCTION AND TUTORIAL

Disclaimer: This article is not meant as an anti-Microsoft article. The author is a satisfied user of Microsoft Word and Excel, and a happy user of Wordfast.

Section 1. Introducing OmegaT

Translators using Microsoft products are often not aware of two things. The first is that quite a few fellow-translators and clients do not use Microsoft Word. The second is that even the same version of Microsoft Word installed on two computers may not necessarily be 100% mutually compatible. Owing to these two pieces of ignorance, many translators are fearful of making a change to non-Microsoft products.

There are usually other reasons for sticking to Microsoft Word regardless. One such reason is that translators feel many translation related programs are not available in non-Microsoft Word formats. Take Trados, for example - there is no Trados for WordPerfect. Or take any of the three most well-known Afrikaans spell-checkers - all work only on Microsoft Word.

One serious alternative to Microsoft Word is OpenOffice Writer. The fact that neither Trados nor the more affordable Wordfast can be used with OpenOffice Writer, used to present a serious stumbling block for translators used to using computer assisted translation (CAT) technologies with translation memory, fuzzy matching and active glossary look-up.

This is where OmegaT comes in. OmegaT is a freeware translation memory program that doesn't run on Microsoft Word. It seamlessly imports and exports plaintext, OpenOffice Writer documents and HTML.

OmegaT is a no-nonsense tool that increases productivity and consistency without taking creativity out of a translator's hands. The only requirement for using OmegaT is a reasonably fast computer with a Java Runtime Environment installed on it.

This section deals with Java JDK/SDK 1.4x, OpenOffice 1.1, and OmegaT 1.3.5. For simplicity's sake I assume the reader has Windows, but these programs are also available for Linux and the new Mac OSX. OmegaT needs Java to work. OpenOffice is optional, but it is useful to have.

Java JDK/SDK 1.4x

Programs written in Java have one small drawback - the user needs to have Java installed on his computer to use it. The advantage to Java is that a single program can be used on many different types of computers as long as those computers have their version of Java installed. For this reason OmegaT is just as accessible to Linux or Mac users as it is to Windows users.

You can download the latest version of Java at http://java.sun.com/. The recommended version for Omega 1.3x is Java SDK 1.4x. Mac OSX computers have Java 1.4.2x pre-installed.

Java based programs tend to use a little more processing power than ready-compiled programs written in other popular programming languages, which is why using a fast computer is preferable. That said, I've had satisfactory results with OmegaT 1.0.2 and Java SDK 1.2 on an old Pentium 333 with 64 MB RAM and Windows 95.

Some web browsers such as Mozilla or Internet Explorer have a Java Runtime Environment (also called a virtual machine) embedded in the browser. Unfortunately these Java installations are often capable of running web applications only, and they can't be used for fully fledged programs such as OmegaT.

Don't try to install Java on a machine that has Java already installed on it. To test if you already have Java installed, try running the OmegaT program by double-clicking the JAR or BAT file, or see if your computer gives the option to uninstall Java in the Add/Remove Programs utility in the Control Panel. If you have Java, but you decide to get a newer version, first uninstall the older version.

OpenOffice 1.1x

OpenOffice is a whole suite of office programs. The bundle includes a word processor, a spreadsheet, a presentations editor, a graphics editor, and an HTML editor.

Most Microsoft Word and Excel documents can be opened, edited and saved in OpenOffice Writer and Spreadsheet. Some features present in Microsoft Office are not included in OpenOffice, and vice versa. OpenOffice will usually prompt you to save as OpenOffice format, but there's always the option to save documents in a Microsoft format.

You can download the latest stable version of OpenOffice 1.1x at http://www.openoffice.org/. If you have a recent version of Java installed on your computer, the OpenOffice installer will enable extra features in the office suite, but Java is not required for OpenOffice to work.

Although OmegaT can import and export OpenOffice Writer files, OpenOffice itself is not required for OmegaT to work. In fact, OmegaT will work on OpenOffice files even if OpenOffice is not installed on the same computer. But OmegaT cannot convert OpenOffice files to Microsoft format and vice versa - for that you need OpenOffice itself.

OpenOffice has its own macro language, but it cannot import Microsoft's VBA based macro language used in Word and Excel. For this reason macro based programs such as Wordfast, Wordfisher and Ando cannot run on OpenOffice. Similarly, a macro written for OpenOffice will not run in Microsoft Word or Excel.

While macros can add functionality to documents, they are often not explicitly required for viewing the document. Translators using OpenOffice may wish to tell clients not to send them documents that depend on macros (although in most cases, OpenOffice will keep the Microsoft Word macro intact even if it can't execute it).

There is a free Afrikaans spell-checker available for OpenOffice. This spell-checker is based partly on the revised Bernard Nieuwoudt word list of 160 000 words. You can download the spell-checker and view detailed installation instructions from http://www.translate.org.za/. Spell-checkers and thesauri for other languages are linked-to from the OpenOffice web site itself.

OmegaT 1.3.5

The biggest advantage of OmegaT for translators is consistency. OmegaT generally also speeds up the translation process.

OmegaT automatically finds similar or exact phrases in a translation project which might span several files in different formats. This enables the translator to check the paragraph he is translating at that moment with similar sentences elsewhere in the project.

If those similar sentences have been translated already, OmegaT tells the translator how they have been translated. One could say the translator is consulting himself, and need no longer struggle with the same or similar difficult paragraphs all over again.

The latest version of OmegaT can be downloaded at http://www.omegat.org/omegat/omegat.html. For slower computers, version 1.0.2 is recommended (it's also the version covered by the OmegaT user manual), but version 1.3x has a simpler interface. You can ask questions from fellow OmegaT users at http://groups.yahoo.com/group/OmegaT/.

Although OmegaT aids and automates some of the translation processes, it is not simply a case of importing the source file, pressing a few buttons, and exporting the target file. The translator himself does the translating, and before he can start, he must set up a few things manually.

Briefly, OmegaT first creates a set of project folders. The user then copies the source documents into the source text folder. The user translates those documents using OmegaT. When the user is finished, he uses OmegaT to compile the translation into the final product.

Very brief introduction to CAT

*CAT = computer assisted translation.

Before the translation process begins, OmegaT creates a translation memory based on the individual paragraphs contained in all the source text documents. These paragraphs are called segments in CAT-speak. OmegaT also checks to see if it can find segments that are either moderately similar or almost exactly alike. Moderately similar segments are called fuzzy matches and segments that are more than 99% alike are called exact matches.

When the user moves his cursor to any segment, OmegaT checks the translation memory for fuzzy or exact matches, and alerts the user to their existence using colour codes. If the match has been translated previously, the user can opt to re-use or partially re-use the existing translation.

Although this is extremely helpful in texts with a lot of repetitive or near-repetitive segments, it usually speeds up the translation process even in texts that are not repetitive. The main advantage is better overall consistency within the translation project.

Unlike many other CAT tools that segment by sentence or phrase, OmegaT segments by paragraph. There are, however, OpenOffice macros available to convert real sentences to virtual paragraphs, which enable OmegaT to create smaller segments.

The user can copy his own glossaries to the project's glossary subfolder. OmegaT will then automatically search the glossaries for words that occur in the source segments and alert the translation to their existence. This is useful especially when glossaries contain thousands of entries. This feature is disabled in OmegaT 1.3.5 but reappears in later versions.

Section 2. Brief tutorial for OmegaT

OmegaT is a tool that helps translators work more consistently in medium to large translation projects. It enables translators to automatically compare paragraphs that are similar. If a similar paragraph has been translated, OmegaT will even tell the translator how it was translated. In this way translators needn't struggle with difficult sentences all over again.

This is a tutorial for OmegaT 1.3.5. To run this version of OmegaT on your computer, you need at least Java SDK 1.4x installed as well. For simplicity's sake I will assume the reader uses Microsoft Windows, but you can use Linux or Mac OSX as well. Download OmegaT at http://www.omegat.org/omegat/omegat.html. Java can be downloaded at http://java.sun.com/.

Installing OmegaT

Download OmegaT and unzip the file to C:\OmegaT\ (or any other folder of your choice). OmegaT should run when you double-click the file OmegaT.jar. If it doesn't, you may have to edit the path information in the OmegaT.bat file to point to your Java Runtime Environment's folder.

On my computer, the path to run OmegaT is (without line wrap): C:\Progra~1\Java\j2sdk1.4.2\jre\bin\java.exe -jar C:\OmegaT\OmegaT.jar

When you run OmegaT from the command line or via the OmegaT.bat file, there may be an extra Command Prompt or MS DOS window open. Do not close this window or it might close OmegaT as well.

There is a little annoying feature in OmegaT and/or Java on Windows machines. When browsing a dialog box, OmegaT might try to repeatedly access the A:\ drive. Simply click Cancel when this happens, and if the dialog box should seem to have disappeared, use Alt-Tab to find it again.

Do yourself a favour and download an OmegaT user manual as well (the current manual covers version 1.0.2 only). If you save the manual in C:\OmegaT\docs\, and you rename the index file to OmegaT.html, you'll be able to access the manual directly from within OmegaT.

Creating and opening a project

If you run OmegaT on a small screen, maximise it when it starts up.

Create a new set of project folders by clicking File -> Create. Browse to the folder where you want to save OmegaT's files (for example My Documents), and type in the name of a project. Click Save. OmegaT will automatically create a source, target, glossary and TM (translation memory) folder, but you can change the location of these folders on the screen that pops up right after you clicked Save.

You should also specify the source and target languages of the current project on the same screen. The language codes consist of a two-letter language code, a hyphen and a two-letter country code. Afrikaans might be AF-ZA and English may be EN-UK. Click OK.

Minimise OmegaT and browse to the project folder that you specified. Copy your source files into the subfolder called "source".

Now maximise OmegaT again, and click File -> Open. Browse to the project folder (which might even have a little OmegaT icon), and select it, and click Open. If you forgot to add source text files to the source subfolder, or if you forgot to convert Microsoft Word files to OpenOffice format, you'll get an error message. OmegaT will load the first source file and also open a window called Project Files with a list of files and their number of segments.

The translation process

The entire text of the current document is visible at all times. As you translate the segments, the source text will be replaced by the target text. The source text of translated segments will not be visible unless you navigate to those segments.

In OmegaT 1.3.5, the current segment is highlighted in yellow. Wordfast creates a copy of the segment and displays it between two segment tags. The first two tags are <segment 0001> and <end segment>. Translate the text between the two tags by overwriting or replacing the source text.

Tip: To make a clearer distinction between text that has been translated and non-translated text, click Configuration -> Fonts and change the appearance of the non-translated text to something very different from the translated text. In this way unchanged segments will stand out clearer.

Most normal text editing functions work inside the editing screen. You can try Ctrl+C for copy, Ctrl+X for cut, and Ctrl+V for paste. Try Shift+Ctrl+arrow to highlight words and overtype them. To protect the end-of-segment tag, you can't highlight the entire last word and delete it (but you can highlight it up to the second-to-last character and delete it).

Type your translation, delete any left-over source text, and press Ctrl+N to move on to the next segment. You can at any time move back to previous segments with Ctrl+P. You don't have to translate segments in any given order, and you can change or edit segments you've already translated - simply move to the segment you want to translate or edit by using Ctrl+N or Ctrl+P.

Tip: Pressing ENTER also moves to the next segment (same as Ctrl+N). Ctrl+ENTER moves to the previous segment (same as Ctrl+P).

Remember to click File -> Save regularly. Saving the file will not compile the target text, however. If you want to see what the target text looks, click File -> Compile, and then open the file the target subfolder using the appropriate viewer.

While you can only work on one file at a time, you don't have to finish one file before moving on to another. Click File -> Show file list to open the Project Files window, and click on the file you want to translate. Remember to click File -> Save before changing to a different file.

Fuzzy matching

OmegaT 1.3.5 supports fuzzy matching using bright colours. Only one fuzzy match is displayed at a time, but you can cycle through other matches by using Ctrl+1, Ctrl+2 etc. You can re-use a fuzzy match by using Ctrl+R or Ctrl+I. Ctrl+R will replace the entire target segment with the fuzzy match text. Ctrl+I will add the fuzzy match text at the cursor position. You can even add more than one fuzzy match's text by using Ctrl+numbers and Ctrl+I repeatedly.

Find functions

OmegaT offers two types of searches, with wild cards. Click Edit -> Find (or Ctrl+F) to do a search. OmegaT 1.3.5 can search for phrases or multiple search terms on Linux but not on Windows.

The so-called Exact Match search finds whole words and partial words from both the source and target text of the current project. If you've copied old TMs from previous projects to the current project's TM subfolder, select Search TMs to search them as well. The so-called Keyword search finds whole words only, and it searches through the source text only.

Automatic pre-translation

If the current project contains many segments that are similar to that of a previous project, you can attempt to pre-translate your current source files using the old TMs. Simply copy the old TMs to the current project's TM subfolder and click Tools -> Pseudo translate. This action cannot be undone, so make backups.

When performing a pseudo-translate, OmegaT replaces all target text segments in the entire project with either the first available exact match in the translation memory or with the source text prepended with "omegat" and a hyphen.

TM and glossary formats

OmegaT uses the industry standard TMX 1.1 format for translation memory. This means that it can import TM files made by Trados, Wordfast or any other standards compliant CAT tool as long as the TM file was saved in TMX version 1.1 (not version 1.2+). You can also import OmegaT's TMs into those applications.

The glossary format is a plaintext tab delimited file. The first column contains the source word, the second contains the translation, and a third column contains any comments. Glossaries must be created using a text editor and saved in the project's glossary subfolder before using OmegaT. OmegaT automatically recognises glossary terms, but contains no internal glossary editor.

Customising OmegaT

OmegaT determines file type by examining the file name extension. It will regard myfile.txt as a plaintext file, it will regard myfile.sxw as an OpenOffice file, and it will regard myfile.rob as an unknown file type (and consequently ignore it even if it resides in the project's source subfolder). By editing the file_extension_mapping file in the project's omegat subfolder, you can tell OmegaT to regard myfile.rob as a TXT file, an HTML file or a SXW file.

OmegaT will help translate all files of known file type that reside in the project's source folder. By adding the names of certain files to the ignore_file_list file, also in the omegat subfolder, you can tell OmegaT to skip over specific files even if they occur in the source folder.

Working with formatted documents

When translating plaintext files in OmegaT, you don't have to worry about things like bold, italics or changes in the font. With HTML or OpenOffice documents, however, OmegaT has to somehow retain the formatting. This is done using internal tags.

A sentence like "This is a house" in which the word "This" is in bold, "a" is in a different font, and "house" is in italics, will be marked up in OmegaT as "This</f1> is <f2>a</f3> <f4>house.</f5>".

If you translate this as "Dit</f1> is <f2>'n</f3> <f4>huis.</f5>", OmegaT will know which words must be in, in a different font or in italics. When the final product is compiled, the translated document will look the same as the original.

Before compiling the final product, click Tools -> Validate tags to check for any errors in internal tags. Uncorrected errors will corrupt the final OpenOffice or HTML document.

An annoying feature present in OmegaT 1.3.5 (but hopefully fixed in later versions) is that the style used for the first word of the source text will also be the style used for the first word of the target text.

Licencing and support

OmegaT is open source software licenced in terms of the GNU General Public License (version 2). In lay terms this means that OmegaT is unrestricted freeware for all types of use (including commercial), and that anyone may modify the source code to suit their own requirements.

OmegaT is developed as a pet project by Keith Godfrey, but Keith can unfortunately not offer direct support for his product. Instead, users may read posts at http://wiki.wesolveitnet.com/ or join the mailing list at http://groups.yahoo.com/group/omegat/. Here more experienced users may be able to assist new users.