Software localization

The educational technology and digital learning wiki
Jump to navigation Jump to search

Draft

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Introduction

Software localization (or localisation) could mean simple "translation of software to another language", including adaptation of some formats (e.g. measures and dates) and currency. But usually, software localization implies more.

Firstly, software should be internationalized (I18N), i.e. desiged in a way that it can be adapted to various languages and regions without engineering changes to the programming logic. It is "stuff that has to be done once" in principle. But sometimes I18N is done in stages, e.g. often developers forget that somelanguages are more verbose than others (need more space) or work in other directions (e.g. right-to-left).

Localization ("L10N") means adaption of a product following the needs of a particular population in a precise geographic region. Such a definition implies that translation includes linguistic, cultural and ergonomic aspects. (Le grand dictionnaire terminologique). McKethan and White (2005) define localization as “the process of adapting an internationalized product to a specific language, script, cultural, and coded character set environment. In localization, the same semantics are preserved while the syntax may be changed.” The authors further argue that “Localization goes beyond mere translation. The user must be able to not only select the desired language, but other local conventions as well. For instance, one can select German as a language, but also Switzerland as the specific locale of German. Locale allows for national or locale-specific variations on the usage of format, currency, spellchecker, punctuation, etc., all within the single German language area.”

Gregory M. Shreve (retrieved 16:27, 21 January 2010 (UTC)) adds adaptation of "non-textual materials". “Localization is the process of preparing locale-specific versions of a product and consists of the translation of textual material into the language and textual conventions of the target locale and the adaptation of non-textual materials and delivery mechanisms to take into account the cultural requirements of that locale.”

Usually, language localization extends to national subcultures. E.g. German would deline in De-de (German) and other versions for Switzerland and Austria. National versions of a language include different words, different spellings (like "localization" vs. "localisation"), and sometimes different grammar. Conversely, a multi-lingual country like Switzerland would have German Swiss (de-ch) French Suiss (fr-ch) and Italian-Swiss (it-ch). The latter would have in common the way data/time, decimals and currency is represented. Software translators may adapt the following strategry:

  • Create a generic translation for one language, e.g. fr and then add specific local variants like fr_fr, fr_be on top of it.
  • Create only one translation, e.g. fr_fr, and have users cope with it. Data/time and decimal/currency representation differences should be handled though. Most often, this is the case in free open source software.

Finally, the term Software globalization (G11N), also known as National Language Support is the combination of software internationalization (I18N) and localization (L10N).

Let's recapitulate and explain the abbreviations for internationalization, localization and globalization:

Internationalization, known as I18N, is a funny acronym called numeronym, where 18 stands for the number of letters between the first i and last n in internationalization.

Localization, known as l10n (or L10N) , is composed of the l of localization, followed by 10 letters (ocalizatio) and the final n of localization.

Sofware globalization in short: G11N = I18N + L10N

Issues

Since we shall focus on translation of open source software, we shall stress the importance of "ergonomic aspects" as the #1 priority of localization. Ergonomic translation means both "surface usability" (users can understand the meaning of UI interface elements and system messages) and cognitive ergonomic (user can get meaningful tasks done with the system).

Infrastructure and people

In a larger project, the list of types of participants can be quite long: E.g. Gregory M. Shreve identifies: Project managers, Translators (Generic), Localization Translators (Specialists), Terminologists, Internationalization/Localization Engineers (Software Background), Proofreaders, QA specialists, Testing engineers, Multilingual Desktop publishing specialists.

Now, what would the absolute minimal rules in a volonteer-based open source project ?

  • one person to coordinate software development and translation
  • one person to coach software string translators (can be the one above)
  • one person to coach manual writing, including a glossary. (can be the one above)
  • translators
  • users that provide feedback about ergonomics (difficult to get) and spelling

Target

Target of I18N and L10N should be thought of in terms of the whole system. A software product may include:

User documentation (includes several genres and available through several media, e.g. the software itself, HTML, paper/PDF)
Manuals
Short contextual help
Glossary
Tutorials
....
Software (user)
Menus and Icons (visual command language)
Messages (various output)
Command languages (maybe)
....
Software documentation
Documentation of language constants (and other useful elements)
Developer manuals (maybe)

Since translation work is often split up between several volonteers (e.g. software modules, manuals, tutorials), there ought to be some potential for synergy that FOSS developers might try to develop. E.g. people writing the tutorials also should make comments about the meaningfulness of system messages as well as other usability issues, and not just try to adapt the user to the system.

On the code side

Language files
All output messages to the user must be defined as a kind of constant that the programmers will use
Name of the constant should be meaningful to translators. E.g.
Languages files must be separate (if terms are not in a database)
Encoding
Space and Layout
  • Space of text fields: Some languages are more verbose and one must plan for that, by using wider icons, menu items, user input fields and such or else use a "fluid" design.

Managing volonteers in an open source project

Opensource projects don't have the funding to pay professional translators. This situation has disadvantages but also some advantages.

Disadvantages: Quality of the translation, completion (untranslated strings for new versions, missing languages, etc.)
Advantages: Meaninfulness of the translation (often translators are users, i.e. they have know how of the tool which "normal affordable" translators do not have.

I (16:27, 21 January 2010 (UTC)) believe that there ought to be some strategies to improve volonteer translation efforts. The main issues:

(1) Motivating people to help and to continue translating

(2) Make sure that translation is usable by providing a decent enough translation support environment (see next item)

Technical infrastructure

Translators should "see" what they translate. This implies concerns several items. When translating a string, the translator should be able to see:

  • The name of the constant (which must be meaningful, e.g. "modulename.mainmenu.edit" or "modulename.errormsg.upload.xxx")
  • A meanigful short description. This description may include a link to a glossary.
  • All other translations (languages strings) the translator understands (e.g. if I translate to German, I'd like see both English and French)
  • (If possible) the constant displayed in the interface. That of course requires extra programming. Or even better: be able to edit strings directly on the interface
  • Tools for consistency
    • When translating hundreds of strings (and the situation gets worse if it's done by several people) there should be a way to search through all terms in all modules in three ways:
  • Find same expressions in the target language and display an other language next to it
  • Find same expressions in another language and display the target strings next to it
  • Be able to edit and consult a short glossary that includes the most important terms (might be combined with the general user manual)
  • (dreaming) direct access to some online translation dictionary like the english/french grand dictionnaire terminologique

Mozilla example

Let's examine a few features of the Mozilla L10N strategy.

In the Mozilla project, localization strings are managed through XUL. The following fragment defines two strings to be displayed as so-called tokens

 <caption label="&<b class="token">identityTitle.label</b>;"/>
 <description>&<b class="token">identityDesc.label</b>;</description>

identityTitle.label and identityDesc.label will be substituted by a strings defined in a DTD as entities.

<!ENTITY <b class="token">identityTitle.label</b> "Identity">
<!ENTITY <b class="token">identityDesc.label</b> "Each account has an identity, which is the ↵
↳ information that other people see when they read your messages.">

("↵ ↳" indicates a single line, broken for readability) If you want to find more example constants defined as XML entities, locate the firefox installation directory on your computer and examine chrome/ab-CD.jar file, e.g. en-US.jar.

The L10N tools include

  • Use of a text editor that can handle UTF-8 files
  • A langpack2cvstree.sh script that converts the en-US language package into another locale.
  • A command line/web tool, called compare locales: finds missing and obsolate strings in a localization
  • Example: short text that describes the tool
  • MozillaBuild: An easy way to install everything you need to checkout/pull and checkin/push your localization and run compare-locales on Windows.
  • Mozilla Translator is a tool to help translate programs
  • Narro, is a web application that allows online translation and coordination. You can see in operation at l10n.mozilla.org
  • Translate Toolkit (moz2po and po2moz): converts various sorts of Mozilla files to Gettext PO format for translation efforts using a PO editor and the other way round. It's used be Pootle for example (see below).
  • MozLCDB similiar to PO but more dedicated to Mozilla products
  • Pootle a web server for localisation that allows web-based contributions and management. Combined with the Translate Toolkit it allows Mozilla products to be localised online.
  • Virtaal is an off-line PO editor developed by the Pootle team.

We wonder a bit who uses which tools. If we understand the situation right, there are several ways to translate as long as the translation does find its way back into the CVS at some point.

Also, it is not suprising, that the project makes a clear distinction between "official releases" and others. Official releases include translation of the installation and migration process, localizing the start page and other web pages built into the product, customizing settings like "live bookmarks", locally relevant search engine plugins, and more.

Now I wonder if translation learning management systems into different languages could imply using a different pedagogical vocabulary. E.g. "french didactics" vs. "belgian instructional design" vs. "canadian constructivism" (without any English word) vs. Swiss "let's have a bit of all".

Technical issues

Document Formats

In open source, there exist several strategies:

  • Gettext is based on the idea that keys used to retrieve local language strings corresponds to the original string used in the source code. Documentation also is added as programming comment just before the corresponing line. From the source code so-called .PO files that are then use by translators.
  • XLIFF (XML Localization Interchange File Format) is an XML-based format created to standardize localization. XLIFF was standardized by OASIS in 2002.

There exist convertors from PO to XLIFF.

Software

PO editors

Links

Definitions
Organizations
How-to
Internationalization
  • Dotnet-culture.net provides information about date/time and decimal/currency representation.
About language file formats
Example Project and languages
Courses
Software
  • Translate Toolkit (Wikipedia)
  • More: Trados® Freelance™, Atril Déjà Vu, STAR Transit, SDLX™, IBM TranslationManager,
Indexes

Bibliography

  • McKethan, Kenneth A. (Sandy)Jr. and Graciela White (2005). Demystifying Software Globalization, Translation Journal 9 (2), April 2005. HTML, retrieved 16:27, 21 January 2010 (UTC).
  • Esselink Bert (2000), A Practical Guide to Localization, , John Benjamins Publishing, ISBN 1-58811-006-0