Software localization: Difference between revisions
m (using an external editor) |
m (using an external editor) |
||
Line 175: | Line 175: | ||
=== Wiktionary === | === Wiktionary === | ||
{{quotation|Designed as the lexical companion to Wikipedia, the encyclopaedia project, Wiktionary has grown beyond a standard dictionary and now includes a thesaurus, a rhyme guide, phrase books, language statistics and extensive appendices. We aim to include not only the definition of a word, but also enough information to really understand it. Thus etymologies, pronunciations, sample quotations, synonyms, antonyms and translations are included.}} ([http://en.wiktionary.org/wiki/Wiktionary:Main_Page Main Page], retrieved 17: | {{quotation|Designed as the lexical companion to Wikipedia, the encyclopaedia project, Wiktionary has grown beyond a standard dictionary and now includes a thesaurus, a rhyme guide, phrase books, language statistics and extensive appendices. We aim to include not only the definition of a word, but also enough information to really understand it. Thus etymologies, pronunciations, sample quotations, synonyms, antonyms and translations are included.}} ([http://en.wiktionary.org/wiki/Wiktionary:Main_Page Main Page], retrieved 17:36, 29 January 2010 (UTC)). Advanced users may [http://en.wiktionary.org/wiki/Wiktionary:Preferences customize] Wiktionary preferences. Entries can be search or looked up through various indexes. | ||
The structure of Wiktionary entries (pages) seems to be a moving target, i.e. together with emerging needs complexity is added or page structure is modified. Also, various language variants don't look the same. I.e. the french version includes the use of icons in from of main headings (much prettier ...). Finally, exploitation of various possible headers and templates may be different in various culture. E.g. for a same volume, the German version seems to be more systematic, i.e. more entries seem to be written in a sort of semantic network perspective, i.e. typically includes super- and sub-terms. Let's have a look at ''Internet'', top of english, french and german pages (Jan 2010): | The structure of Wiktionary entries (pages) seems to be a moving target, i.e. together with emerging needs complexity is added or page structure is modified. Also, various language variants don't look the same. I.e. the french version includes the use of icons in from of main headings (much prettier ...). Finally, exploitation of various possible headers and templates may be different in various culture. E.g. for a same volume, the German version seems to be more systematic, i.e. more entries seem to be written in a sort of semantic network perspective, i.e. typically includes super- and sub-terms. Let's have a look at ''Internet'', top of english, french and german pages (Jan 2010): | ||
Line 301: | Line 301: | ||
As the reader may notice, there are many templates (<nowiki>{{....}}</nowiki>). An author may just create minimal entries or fix existing entries. Advanced authors, as we mentionned above, may choose from probably hundreds of different templates. | As the reader may notice, there are many templates (<nowiki>{{....}}</nowiki>). An author may just create minimal entries or fix existing entries. Advanced authors, as we mentionned above, may choose from probably hundreds of different templates. | ||
[[image:Wiktionary-user-feedback.png|frame|right|Feedback box in the English Wiktionary (Jan 2010)]] | Let's now look at tools for discussion and user feedback. [[image:Wiktionary-user-feedback.png|frame|right|Feedback box in the English Wiktionary (Jan 2010)]] | ||
In the English version (Jan 2010), users may leave a feedback. Users can first signal a problem, i.e. choose among: | In the English version and some but not all other language version (Jan 2010), users may leave a feedback. Users can first signal a problem, i.e. choose among: | ||
* Good | * Good | ||
* Bad | * Bad | ||
Line 316: | Line 316: | ||
Discussion does sometimes happen on wiktionary.org. A nice and funny example is the [http://en.wiktionary.org/wiki/e-dictionary e-dictionary's] [http://en.wiktionary.org/wiki/Talk:e-dictionary discussion page] (about 10 times as long as the entry ...) | Discussion does sometimes happen on wiktionary.org. A nice and funny example is the [http://en.wiktionary.org/wiki/e-dictionary e-dictionary's] [http://en.wiktionary.org/wiki/Talk:e-dictionary discussion page] (about 10 times as long as the entry ...) | ||
We believe that ''Wiktionary'' is an interesting project and for several reasons: | |||
* The flexibilty of the Mediawiki infrastructure allows to cope with emerging views about what a dictionary entry should be. Indeed, from a simple dictionary it evolved into a complex one that inlcudes etymologies, pronunciations, sample quotations, synonyms, antonyms and translations (and more related kind of terms in the German version). | |||
* Wiktionary also integrates well with other [http://wikimedia.org/ Wikimedia] sister projects like Wikipedia or Wikiversity. | |||
Remark on the history: Most Wikitionary entries, as you can read in the [http://en.wikipedia.org/wiki/Wiktionary Wiktionary] entry of Wikipedia were initially created by bots: {{quotation| most of the entries and many of the definitions at the project's largest language editions were created by bots that found creative ways to generate entries or (rarely) automatically imported thousands of entries from previously published dictionaries. Seven of the 18 bots registered at the English Wiktionary[4] created 163,000 of the entries there.[5] Only 259 entries remain (each containing many definitions) on Wiktionary from the original import by Websterbot from public domain sources; the majority of those imports have been split out to thousands of proper entries manually.}} | |||
=== The Google translator toolkit === | === The Google translator toolkit === | ||
Line 510: | Line 516: | ||
# Substantial automation of the Localization process | # Substantial automation of the Localization process | ||
This references model, according to the [http://wiki.oasis-open.org/oaxal/FrontPage Reference Model for Open Architecture for XML Authoring and Localization 1.0], retrieved 17: | This references model, according to the [http://wiki.oasis-open.org/oaxal/FrontPage Reference Model for Open Architecture for XML Authoring and Localization 1.0], retrieved 17:36, 29 January 2010 (UTC) has two prime use cases: | ||
# Support authoring and localization within a CMS environment | # Support authoring and localization within a CMS environment | ||
# Support consistency within a translation-only environment | # Support consistency within a translation-only environment | ||
Line 563: | Line 569: | ||
; Free online translation support systems | ; Free online translation support systems | ||
* [http://translate.google.com/ Google translate]. Note: There is a way to get support for mediawiki files. Upload a filed called ''*.mediawiki''. E.g. edit wikipage and copy/paste source or get it by ''&action=raw'' URL Get parameter, etc. Do not whine if some formatting doesn't render well (each mediawiki is different). This is a non-documented feature and we guess that it may become official after some further development - [[User:Daniel K. Schneider|Daniel K. Schneider]] 17: | * [http://translate.google.com/ Google translate]. Note: There is a way to get support for mediawiki files. Upload a filed called ''*.mediawiki''. E.g. edit wikipage and copy/paste source or get it by ''&action=raw'' URL Get parameter, etc. Do not whine if some formatting doesn't render well (each mediawiki is different). This is a non-documented feature and we guess that it may become official after some further development - [[User:Daniel K. Schneider|Daniel K. Schneider]] 17:36, 29 January 2010 (UTC). | ||
; Frameworks | ; Frameworks | ||
Line 638: | Line 644: | ||
; XLIFF | ; XLIFF | ||
* [http://www.maxprograms.com/articles/xliff.html XML in localisation: Use XLIFF to translate documents] by Rodolfo M. Rava. | * [http://www.maxprograms.com/articles/xliff.html XML in localisation: Use XLIFF to translate documents] by Rodolfo M. Rava. | ||
* [http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html XLIFF: An Aid To Localization], retrieved 17: | * [http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html XLIFF: An Aid To Localization], retrieved 17:36, 29 January 2010 (UTC). (Oracle/Sun Developer Network (SDN). | ||
; About language file formats | ; About language file formats |
Revision as of 18:36, 29 January 2010
This article or section is currently under construction
In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")
<pageby nominor="false" comments="false"/>
Introduction: What do do mean by localization ?
Software localization (or localisation) could mean simple translation of software a interface and messages to another language plus adaptation of some formats (e.g. measures, dates and currency). But usually, software localization implies more.
Firstly, the software should be internationalized (I18N), i.e. desiged in a way that it can be adapted to various languages and regions (not just one) without engineering changes to the programming logic. In principle, it is stuff that has to be done once. But sometimes I18N is done in stages, e.g. often developers forget that some languages are more verbose than others (need more space) or work in other directions (e.g. right-to-left).
In a second step, software has to localized, i.e. translated and adapted to local cultures. Localization (L10N) means adaption of a product following the needs of a particular population in a precise geographic region. Needs include linguistic, cultural and ergonomic aspects. (Le grand dictionnaire terminologique). McKethan and White (2005) define localization as “the process of adapting an internationalized product to a specific language, script, cultural, and coded character set environment. In localization, the same semantics are preserved while the syntax may be changed.” The authors further argue that “Localization goes beyond mere translation. The user must be able to not only select the desired language, but other local conventions as well. For instance, one can select German as a language, but also Switzerland as the specific locale of German. Locale allows for national or locale-specific variations on the usage of format, currency, spellchecker, punctuation, etc., all within the single German language area.”
Gregory M. Shreve (retrieved 16:27, 21 January 2010 (UTC)) adds adaptation of "non-textual materials" like colors, logos and pictures. “Localization is the process of preparing locale-specific versions of a product and consists of the translation of textual material into the language and textual conventions of the target locale and the adaptation of non-textual materials and delivery mechanisms to take into account the cultural requirements of that locale.”
The W3C Internationalzation web site also argues that
localization is often a substantially more complex issue. It can entail customization related to:
- Numeric, date and time formats
- Use of currency
- Keyboard usage
- Collation and sorting
- Symbols, icons and colors
- Text and graphics containing references to objects, actions or ideas which, in a given culture, may be subject to misinterpretation or viewed as insensitive.
- Varying legal requirements
- and many more things.
Usually, language localization extends to national subcultures. E.g. German would decline in De-de (German for Germany) and other versions for Switzerland (de-ch) and Austria (de-at), etc. National versions of a language include different words, different spellings (like "localization" vs. "localisation"), and sometimes different grammar. Conversely, a multi-lingual country like Switzerland would have German Swiss (de-ch), French Suiss (fr-ch) and Italian-Swiss (it-ch). These country-specific localizations have in common the way date/time, decimals and currency is represented. Software translators may adapt the following strategies:
- Create a generic translation for one language, e.g. fr, and then add specific local variants like fr_fr, fr_be on top of it.
- Create only one translation, e.g. fr_fr, and have users cope with it. Data/time and decimal/currency representation differences should be handled though. Most often, this is the case in free open source software.
- Sometimes, localizations only exist for major sub-cultures, e.g. a de-de and de-ch version. Other german speaking cultures like de-at (too close to German) or de-lu and de-li (too small) are not covered.
Finally, the term Software globalization (G11N), also known as National Language Support, refers to the combination of software internationalization (I18N) and localization (L10N).
Let's recapitulate and explain the abbreviations for internationalization, localization and globalization. The "funny" kind of abbreviations used in the localization community are so-called numeronyms as we shall explain for the I18N.
- Internationalization, known as I18N. The 18 stands for the number of letters between the first i and last n in internationalization.
- Localization, known as l10n (or L10N) , is composed of the l of localization, followed by 10 letters (ocalizatio) and the final n of localization.
- Software globalization, known as G11N refers to: I18N + L10N
- Finally, GILT stands for Globalization, Internationalization, Localization and Translation, i.e. all matters related to localization.
Issues
In this chapter we shall explorer some important issues. Since we shall focus on translation of open source software, we shall stress the importance of ergonomic aspects as the #1 priority of localization. By ergonomic aspects we mean both "surface usability" (users can understand the meaning of UI interface elements and system messages) and cognitive ergonomics (user can get meaningful tasks done with the system).
Ergonomy considerations apply to both the products (software, manuals), and to the programmer and translator environments (tools, community, etc.)
Infrastructure and people
In a larger project, the list of types of participants can be quite long: E.g. Gregory M. Shreve identifies: Project managers, Translators (Generic), Localization Translators (Specialists), Terminologists, Internationalization/Localization Engineers (Software Background), Proofreaders, QA specialists, Testing engineers, Multilingual Desktop publishing specialists.
In a volonteer-based open source project we might see the following roles (that my be cumulated by the same person)
- one person to coordinate software development and translation
- one person to coach software string translators
- one person to coach manual writing, including a glossary
- volonteer translators
- volonteer users that provide feedback about the UI ergonomics, spelling, in-context help, etc.
Localization targets
I18N and L10N should be thought of in terms of the whole system. A software product may include:
- User documentation (includes several genres and available through several media, e.g. within the software itself, HTML, paper/PDF)
- Manuals
- Short contextual help
- Glossary
- Tutorials
- ....
- Software (user)
- Menus and Icons (visual command language)
- Command languages (sometimes)
- Messages (various output)
- ....
- Software documentation
- Documentation of language constants (and other useful elements)
- Developer manuals (maybe)
Since translation work is often split up between many volonteers with domain knowledge, there ought to be some potential for synergy that FOSS project managers and developers might try to exploit and develop. E.g. people writing the tutorials also should make comments about the meaningfulness of system messages as well as other usability issues, and not just try to adapt the user to the system.
In addition, translation of software modules, manuals, tutorials ought to be coordinated somewhere, see the terminology issue below.
Managing volonteers in an open source project
Open source projects don't have the funding to pay professional translators. This situation has disadvantages but also some advantages.
- Disadvantages: Quality of the translation, completion (untranslated strings for new versions, missing languages, etc.)
- Advantages: Meaninfulness of the translation. Often translators are users, i.e. they have know how of the tool which "normal affordable" translators do not have. According to [ LISA], a second interesting fact is that “the natural tendency of the localizer is to magnify the difference between [two expressions in the source language that refer to the same thing], just to be on the safe side. The result is that localizers tend to find the problems in a product that were not previously noted, problems that may well have confused users in the source language as well.”
I (16:27, 21 January 2010 (UTC)) believe that there ought to be some strategies to improve volonteer translation efforts and to profit from emerging insights. The main issues:
(1) Motivating people to help and to continue translating
(2) Make sure that translation is usable by providing a decent enough translation support environment (see next item)
(3) Make sure to profit from translator's domain knowledge, plus inconsistencies he may detect in the original source language interface and messages.
Terminology management / Glossary
Terminology management refers to the fact that critical and specialized terms must be clearly defined in the source language and be linked to target languages. With respect to software localization, this should be done independently of translation of UI elements and system messages.
The Localization Industry Standards Association (LISA) defines terminology management as follows: “Quality translation relies on the correct use of specialized terms. It improves reader understanding and reduces the time and costs associated with translation. Special terminology management systems store terms and their translations, so that terms can be translated consistently. Full-featured systems go beyond simple term lookup, however, to contain information about terms, such as part of speech, alternate terms and synonyms, product line information, and usage notes. They are generally integrated with translation memory systems and word processors to improve translator productivity.” (retrieved 15:51, 27 January 2010 (UTC)).
Silvia Pavel defines terminological data as “Any information relating to a term or the concept designated. Note: The more common terminological data include the entry term, equivalents, alternate terms, definitions, contexts, sources, usage labels, and subject fields.” This can lead to rather large records, e.g. have a look at the term Apprentissage/Learning in the Termium Plus database.
Terminology management is not a simple formal question of agreeing on some ways of translating words. It's related to fundamental aspects of language use in the sense of language as action and that requires common grounding. (Clark, 1996).
See the example section below for some online software that can help establish common lexical items, including ways to support process of meaning negotiation between various actors.
On the code side
- Language files
- All output messages to the user must be defined as a kind of name constant that the programmers will use. Alternatively, it can be decided that source langage strings (plus a documentation string nearby in the code) are the constants.
- Name of the constant should be meaningful and available to translators (since at some point here must a common way to talk about a precise translation item)
- Languages files must be separate for the translators (e.g. in a textfile, a localization format, or ad database)
- Encoding
- Nowadys, project should systematically use some form of Unicode
- Space and Layout
- Space of text fields: Some languages are more verbose and one must plan for that, by using wider icons, menu items, user input fields and such or else use a "fluid" design.
Reviewing
Localized content should be tested in target. In other words: Using the software for real or almost real by real target persons does provide good feedback. If this is not an option, the reviewer must be able to "imagine" a real situation, i.e. receive enough contextual information.
In an open-source project like LAMS that allows for frequent updating one could ask volonteer teachers to provide feedback and to ask their learners to do the same (at least write down meaningless/difficult to understand) UI elements and system messages. If this is not possible, translators should at least play all the roles: Administer the system, design both a complex and a simple sequence, use it as teacher (including the monitoring) tool, and experience it as learner. Another option could be "problem/suggestions" button to be installed optionally by the administrator in the interfaces and that would allows user to complain and make suggestions in an easy, quick and non-obtrusive way.
This idea needs to be fleshed out.
Examples of software that collects user information
- The "report Broken Web Site" and "Report Web Forgery" buttons in Firefox (under the Help menu)
- The dialog you get once MS Help dialogs end up without solutions, e.g. three questions: (1) was it useful, (2) why ? and (3) additional information. Now I would like to know what kind of volume / 1000 users this generates and how it is being used.
- Some google help files have a similar interface. (1) Was this information helpful, followed by (2) Easy to understand? (3) Complete with enough details ?
- Google translate. E.g. learning activity management system translated to french "système d'apprentissage en gestion de l'activité" which is totally wrong. Users can contribute a better translation. In addition they could activate the translators toolkit, which includes glossary management, chat, etc.
Technical infrastructure
Translators should "see" what they translate. This implies concerns several items. When translating a string, the translator should be able to see:
- The name of the constant (which must be meaningful, e.g. "modulename.mainmenu.edit" or "modulename.errormsg.upload.xxx")
- A meanigful short description. This description may include a link to a glossary.
- All other translations (languages strings) the translator understands (e.g. if I translate to German, I'd like see both English and French)
- (If possible) the constant displayed in the interface. That of course requires extra programming. Or even better: be able to edit strings directly on the interface
Tools for consistency: When translating hundreds of strings (and the situation gets worse if it's done by several people) there should be a way to search through all terms in all modules in three ways:
- Find same expressions in the target language and display an other language next to it
- Find same expressions in another language and display the target strings next to it
- Be able to edit and consult a short glossary that includes the most important terms (might be combined with the general user manual)
- (dreaming) direct access to some online translation dictionary like the english/frenchgrand dictionnaire terminologique
See also the terminology management / glossary above
Examples of terminology and translation tools used in FOSS projects
In this section, we will look at various examples that raised our interest.
The open terminology forum
The Open Terminology forum is a freely-accessible terminology database whose content is supplied and validated by the members of a non-profit organization. According to What is it? page, “anyone who can show that they have specialized knowledge can become a member of the Forum and participate in their area of expertise, whether they are a professional, a translator, a journalist or an academic. Members can add terminology information, comment on (but not change) that contributed by others, make personal lists of selections of terms and create their personal descriptions with their own HTML links.” , retrieved 15:51, 27 January 2010 (UTC). “Each area of specialization has its own terms. The Forum is a place where these terms can be recorded, refined, discussed and validated by those concerned. As every item of information shown is personally attributed to the contributor, members can build up their reputations”
The main window shows terms in a three column layout. Terms can searched with wildcards (* and ?). A user can then click on each of the "meaning", "English term" and "French equivalent" in order to further explore or to edit the context (with permission)
(1) Click on meaning, then Info will provide contextual information. E.g. the English term "classroom management" is not translated (should become "gestion de classe") is associated with the rather useless "for a lesson" meaning, but clicking on "for a lesson" will show a good definition plus a link.
(2) Click on a term will allow to add variants, occurences, synonyms, translations or a comment.
(3)Click on help will show agreement among experts and some other information.
Wiktionary
“Designed as the lexical companion to Wikipedia, the encyclopaedia project, Wiktionary has grown beyond a standard dictionary and now includes a thesaurus, a rhyme guide, phrase books, language statistics and extensive appendices. We aim to include not only the definition of a word, but also enough information to really understand it. Thus etymologies, pronunciations, sample quotations, synonyms, antonyms and translations are included.” (Main Page, retrieved 17:36, 29 January 2010 (UTC)). Advanced users may customize Wiktionary preferences. Entries can be search or looked up through various indexes.
The structure of Wiktionary entries (pages) seems to be a moving target, i.e. together with emerging needs complexity is added or page structure is modified. Also, various language variants don't look the same. I.e. the french version includes the use of icons in from of main headings (much prettier ...). Finally, exploitation of various possible headers and templates may be different in various culture. E.g. for a same volume, the German version seems to be more systematic, i.e. more entries seem to be written in a sort of semantic network perspective, i.e. typically includes super- and sub-terms. Let's have a look at Internet, top of english, french and german pages (Jan 2010):
Wikionary entries are normal Mediawiki pages, but follow conventions for the use of titles. In addition, users must understand how to use and to find (!) templates. A Mediawiki template can be defined as some kind of extension language that allows for shortcuts, complex formatting, transclusions, etc. More precisely, Wiktionary's Index to templates provides the following definition: “Templates in Wiktionary are a convenient and consistent way of including a snippet of text in various, similar articles. Within the text of an entry, enclosing a template name inside the double curly braces {{xxx}} will instruct Wiktionary to insert right there in the entry when it is viewed the contents of the page [[Template:xxx]].”. In addition, there exist shortcuts which are a specialized types of a redirection page that can be used to get to a commonly used Wiktionary reference page more quickly. Fairly complex languages to learn that makes Daniel K. Schneider sometimes wonder whether Wikipedia and its sister projects are not the world viewed by IT-savy people...
Wiktionary pages may include translations if the spelling of the word is the same. E.g. the localisation page has both an English and a french entry. But usually, a page will refer to entries of other languages through a special template, which allows to transclude these into the page.
A minimal page typically has the following structure:
== English == ===Noun=== === References ===
The structure of a more complete entry can be illustrated by the phrase word. Its table of contents looks like this.
1. English 1.1 Pronunciation 1.2 Etymology 1.3 Noun 1.3.1 Synonyms 1.3.2 Derived terms 1.3.3 See also 1.3.4 Translations 1.4 Verb 1.4.1 Derived terms 1.4.2 Related terms 1.4.3 Translations 1.5 External links 1.6 Anagrams 2. French 2.1 Pronunciation 2.2 Noun 2.3 Anagrams
The french section is included because the word "phrase" does exist in french and means "sentence". The translation sections (one for the noun and one for the verb) includes definitions from many languages by transclusion.
Let's now examine the wiki code of an other medium-sized entry in more detail. The source wiki code of localization entry looks like this:
{{wikipedia}} {{also|localisation}} ==English== ===Etymology=== From [[localize]] + [[-ation]]; compare French ''[[localisation#French|localisation]]'' ===Alternative spellings=== * [[localisation]] (''British'') ===Noun=== {{en-noun}} # The [[act]] of [[localize|localizing]]. # The [[state]] of being [[localized]]. ====Derived terms==== * [[l10n]], [[L10n]] * [[cerebral localization]] * [[chromosomal localization]] ====Related terms==== * [[locale]] * [[localism]] * [[locality]] * [[locate]] ====Translations==== {{trans-top|act of localizing}} * Chinese: :* Mandrin: {{zh-tsp||地方化|difang-hua}} * Dutch: {{t+|nl|lokalisatie|f}} * Finnish: {{t+|fi|lokalisoida}}, {{t+|fi|kotoistaa}} {{trans-mid}} [ .... contents removed .... ] * Norwegian: {{t-|no|lokalisering|f}} * Russian: {{t+|ru|локализация|sc=Cyrl}} {{trans-bottom}} {{trans-top|state of being localized}} * Chinese: :* Mandrin: {{zh-ts||本地化}} * Dutch: {{t+|nl|lokalisatie|f}} {{trans-mid}} * Norwegian: {{t-|no|lokalisering|f}} * Russian: [[локализованность]] {{trans-bottom}} {{trans-top|software engineering: act or process of making a product suitable for use in a particular country or region}} * Chinese: :* Mandrin: {{zh-ts||本地化}} * Dutch: landelijke aanpassing [ .... contents removed .... ] * Russian: [[локализация]] {{trans-bottom}} {{checktrans-top}} * {{ttbc|French}}: {{t+|fr|localisation|f}} * {{ttbc|German}}: [[Lokalisation]] * {{ttbc|Italian}}: [[localizzazione]] {{trans-mid}} (rookaraizeeshon) * {{ttbc|Korean}}: [[지방화]] * {{ttbc|Latin}}: [[localizatio]] * {{ttbc|Portuguese}}: [[localização]] * {{ttbc|Spanish}}: [[localización]] * {{ttbc|Turkish}}: [[Bölge]] {{trans-bottom}} ====See also==== * [[internationalization]] / [[i18n]]
As the reader may notice, there are many templates ({{....}}). An author may just create minimal entries or fix existing entries. Advanced authors, as we mentionned above, may choose from probably hundreds of different templates.
Let's now look at tools for discussion and user feedback.
In the English version and some but not all other language version (Jan 2010), users may leave a feedback. Users can first signal a problem, i.e. choose among:
- Good
- Bad
- Messy
- Mistake in definition
- Confusing
- Could not find the word I want
- Incomplete
- Entry has inaccurate information
- Definition is too complicated
Optionally, a user then may leave a note. It probably will append it to the discussion page. Each mediawiki page has an associated discussion page (including one for this page you are reading).
Discussion does sometimes happen on wiktionary.org. A nice and funny example is the e-dictionary's discussion page (about 10 times as long as the entry ...)
We believe that Wiktionary is an interesting project and for several reasons:
- The flexibilty of the Mediawiki infrastructure allows to cope with emerging views about what a dictionary entry should be. Indeed, from a simple dictionary it evolved into a complex one that inlcudes etymologies, pronunciations, sample quotations, synonyms, antonyms and translations (and more related kind of terms in the German version).
- Wiktionary also integrates well with other Wikimedia sister projects like Wikipedia or Wikiversity.
Remark on the history: Most Wikitionary entries, as you can read in the Wiktionary entry of Wikipedia were initially created by bots: “most of the entries and many of the definitions at the project's largest language editions were created by bots that found creative ways to generate entries or (rarely) automatically imported thousands of entries from previously published dictionaries. Seven of the 18 bots registered at the English Wiktionary[4] created 163,000 of the entries there.[5] Only 259 entries remain (each containing many definitions) on Wiktionary from the original import by Websterbot from public domain sources; the majority of those imports have been split out to thousands of proper entries manually.”
The Google translator toolkit
According to the Google translator toolkit basics page, the kit can do the following:
- Upload Word documents, OpenOffice, RTF, HTML, text, Wikipedia articles and knols.
- Use previous human translations and machine translation to 'pretranslate' your uploaded documents.
- Use our simple WYSIWYG editor to improve the pretranslation.
- Invite others (by email) to edit or view your translations.
- Edit documents online with whomever you choose.
- Download documents to your desktop in their native formats --- Word, OpenOffice, RTF or HTML.
- Publish your Wikipedia and knol translations back to Wikipedia or Knol.
When uploading you can/must:(as of 1/2010)
- choose among the four supported formats
- Use translation memory (global or local)
- Use glossary (none, one of yours)
Once a text is uploaded, it stays in Google memory (at least for a while) and users can then do a side by side translation in the supported formats. Too bad that Google does not suport Mediawiki format without passing through Wikipedia.
- Glossaries
Let's look at Google's Translation memories, glossaries, and placeholders. Part of this kit includes the possibility to upload glossaries and to save TMX files. A glossary is a CSV table with the following format: A arbitrary number of columns with locales (at least) one followed by an optional part of speech (pos) and description columns.
en-AU | fr | gsw | locale4 | .... | pos | description |
---|---|---|---|---|---|---|
lams | mouton | lämmli | noun | Either a software system or variant of sheep. Do not use as a verb | ||
gate | verrou | schperri | noun | An element in pedagogical sequencing that will filter access to the next element in various ways. (... more needed here ...) |
Since CSV uses commas to separate items, Commas in a cell must be escaped by double-quotes ("). For example, to add the term hello, world!, use "hello, world!". Quotes within quotes are escaped by \. For example, She said, "hello." should be entered as "She said, \"hello.\"". Such glossaries can be made with a spreadsheet (e.g. the google docs) and then be exported as CSV and then imported to the translation tool.
- Translation memory
“A translation memory (TM) is a database of human translations. As you translate new sentences, we automatically search all available translation memories for previous translations similar to your new sentence. If such sentences exist, we rank and then show them to you. Comparing your translation to previous human translations improves consistency and saves you time: you can reuse previous translations or adjust them to create new, more contextually appropriate translations. When you finish translating documents in Google Translator Toolkit, we save your translations to a translation memory so you or other translators can avoid duplicating work.” (Using translation memories, retrieved 15:51, 27 January 2010 (UTC)).
Mozilla Projects / Firefox
Let's examine a few features of the Mozilla L10N strategy.
In the Mozilla project, localization strings are managed through XUL. The following fragment defines two strings to be displayed as so-called tokens
<caption label="&<b class="token">identityTitle.label</b>;"/>
<description>&<b class="token">identityDesc.label</b>;</description>
identityTitle.label and identityDesc.label will be substituted by a strings defined in a DTD as entities.
<!ENTITY <b class="token">identityTitle.label</b> "Identity">
<!ENTITY <b class="token">identityDesc.label</b> "Each account has an identity, which is the ↵
↳ information that other people see when they read your messages.">
("↵ ↳" indicates a single line, broken for readability) If you want to find more example constants defined as XML entities, locate the firefox installation directory on your computer and examine chrome/ab-CD.jar file, e.g. en-US.jar.
The L10N tools include
- Use of a text editor that can handle UTF-8 files
- A langpack2cvstree.sh script that converts the en-US language package into another locale.
- A command line/web tool, called compare locales: finds missing and obsolate strings in a localization
- Example: short text that describes the tool
- MozillaBuild: An easy way to install everything you need to checkout/pull and checkin/push your localization and run compare-locales on Windows.
- Mozilla Translator is a tool to help translate programs
- Narro, is a web application that allows online translation and coordination. You can see in operation at l10n.mozilla.org
- Translate Toolkit (moz2po and po2moz): converts various sorts of Mozilla files to Gettext PO format for translation efforts using a PO editor and the other way round. It's used be Pootle for example (see below).
- MozLCDB similiar to PO but more dedicated to Mozilla products
- Pootle a web server for localisation that allows web-based contributions and management. Combined with the Translate Toolkit it allows Mozilla products to be localised online.
- Virtaal is an off-line PO editor developed by the Pootle team.
We wonder a bit who uses which tools. If we understand the situation right, there are several ways to translate as long as the translation does find its way back into the CVS at some point.
Also, it is not suprising, that the project makes a clear distinction between "official releases" and others. Official releases include translation of the installation and migration process, localizing the start page and other web pages built into the product, customizing settings like "live bookmarks", locally relevant search engine plugins, and more.
Now I wonder if translation learning management systems into different languages could imply using a different pedagogical vocabulary. E.g. "french didactics" vs. "belgian instructional design" vs. "canadian constructivism" (without any English word) vs. Swiss "let's have a bit of all".
LAMS
LAMS is a system for authoring and delivering learning activities, i.e. a learning design and learning activity management software. Since I made a little commitment to help with translation, I also shall insert some comments regarding features and UI elements that we could make better - Daniel K. Schneider 14:33, 22 January 2010 (UTC).
This open source project developed an original online solution to support translation. Software translators will use two tools:
- An internationalization site (see below)
- A LAMS Translators server (that is upgrade after a translation effort so that a translator can check the look and feel on a live system and make sure that there are no conceptual errors)
- Editing translation strings
Strings to translate are called labels. A string may include slots to filled in with data and use the syntax {0}, {1}, etc.
The LAMS Internationalization site presents a list of available software modules and within each module, completion is shown as the following picture shows.
Translators can display a module, each label is displayed as a list of
- English String, a french translation, update date, translator
The translator then can choose between:
- editing an isolated label
- bulk translate all labels (date and author information of all tags will be overwritten)
- translate only missing labels
Below are a few screenshots that illustrate the principle
- Homogenization
For the moment, there is no support for language-specific glossary and terminology management.
Good translators may develop several strategies to make sure that a common terminology is used.
Terminology must be homogeneous, e.g. the English word "Cancel" should normally translate to the same french word, e.g. "annuller" (v.s. "abandonner"). One also could argue that translators sometimes have to use less meaningful words, because at some point MS made certain decisions and people are used to it.
"Non-natural" terminology like "branching" or "gates" raise additional challenges. In french I translated this to "verrou" after looking at the German "Sperre". For various reasons I don't like the obvious translation of "Gate" to "Portail" (meaning "Portal"). Verrou means "lock" but also could mean a barrier in some geological sense. Anyhow, if several translators participate or if one translator translates once in a while, there is a very high chance that very different words are being used to deal with the same object. Also in my case, only once I decided to use LAMS for real, I stumbled upon terminology (created by myself) and that I really found unclear.
Fortunately in LAMS (and this is a mission-critical feature), translators can search for labels in any language (Note: Not true actually only in English and the target language) and then see the result in a similar fashion as in the bulk translate window.
This feature allows to see (1) where certain words of the target language are being use (e.g. check if the same word is used for different things, which may or may not be a proble) and (2) more importantly, how a given english word or phrase has been translated.
Since the result displayed is an editable bulk translation window, re-translation is pretty fast.
- Testing
Testing is basically done by playing with a translator's test server. Translators have full rights, e.g. can explore the admin interface, create new teacher and student users and play with these. When the project manager sees that new important translations have been comitted he will update the test server, in particular the weeks before a software upgrade.
- Under the hood
($$$ addition needed)
- Labels are Java properties and my include slots for arguments, e.g.
The {0} cannot be within the range of an existing condition.
Formats and software
Lowlevel formats
There exist various methods to handle language strings in computer programs.
Most often, some sort of constants are being used in the code. These constants are then defined either in so-called languages files or in a database. Let's have a look at some examples:
In a mediawiki (the software used for this wiki). Translations of language strings can either sit in files or in a database or both.
To ease translation work, language files can be generated from more highlevel formats, such as Gettext or XLIFF. Also Gettext and XLIFF can be generated from computer code and merged back into it.
Gettext
Gettext is a translation strings strategy and technology, popular in open source. is based on the idea that keys used to retrieve local language strings corresponds to the original string used in the source code. Documentation also is added as programming comment just before the corresponing line. From the source code so-called .PO files that are then use by translators.
XLIFF
- XLIFF (XML Localization Interchange File Format) is an XML-based format created to standardize localization. I.e. XLIFF is an alternative to Gettext. XLIFF was standardized by OASIS in 2002. According to the Specification 1.2, XLIFF is the XML Localization Interchange File Format designed by a group of software providers, localization service providers, and localization tools providers. It is intended to give any software provider a single interchange file format that can be understood by any localization provider. It is loosely based on the OpenTag version 1.2 specification and borrows from the TMX 1.2 specification. However, it is different enough from either one to be its own format.
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'>
<file original='hello.txt' source-language='en' target-language='fr'
datatype='plaintext'>
<body>
<trans-unit id='hi'>
<source>Hello world </source>
<target>Bonjour le monde</target>
<alt-trans>
<target xml:lang='es'>Hola mundo</target>
</alt-trans>
</trans-unit>
</body>
</file>
</xliff>
The most important elements are <trans-unit> and <bin-unit>. They contain the translatable portions of the document. The <trans-unit> element contains the text to be translated, the translations, and other related information. The <bin-unit> contains binary data that may or may not need to be translated; it also can contain translated versions of the binary object as well as other related information.
In the trans-unit element, text to be tranlated is contained in a <source> element, the translated tell will be within the <target> element.
There exist convertors from PO to XLIFF and the other way round.
The OAXAL framework
- Open Architecture for XML Authoring and Localization (OAXAL) is a reference model defining the component parts of XML publishing with respect to the authoring and Localization aspects of the process. It has three main goals:
- Consistency in the authored source-language text
- Consistency in the translated text
- Substantial automation of the Localization process
This references model, according to the Reference Model for Open Architecture for XML Authoring and Localization 1.0, retrieved 17:36, 29 January 2010 (UTC) has two prime use cases:
- Support authoring and localization within a CMS environment
- Support consistency within a translation-only environment
To achieve these goals and handle the use cases OAXAL defined the following software stack:
Let's shortly describe some of these elements:
- ITS (Internationalization Tag Set (ITS) Version 1.0) “is a technology to easily create XML which is internationalized and can be localized effectively. On the one hand, the ITS specification identifies concepts (such as "directionality") which are important for internationalization and localization. On the other hand, the ITS specification defines implementations of these concepts (termed "ITS data categories") as a set of elements and attributes called the Internationalization Tag Set (ITS).”. This work is explained in a Best Practices for XML Internationalization working group note.
Basically ITS allows you to insert tags into an XML tree that will tell systems (and human readers) of XML what should not be translated....
Below we reproduce a simple ITS example:
<dbk:article
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:dbk="http://docbook.org/ns/docbook"
its:version="1.0" version="5.0" xml:lang="en">
<dbk:info>
<dbk:title>An example article</dbk:title>
<dbk:author
its:translate="no">
<dbk:personname>
<dbk:firstname>John</dbk:firstname>
<dbk:surname>Doe</dbk:surname>
</dbk:personname>
<dbk:affiliation>
<dbk:address>
<dbk:email>foo@example.com</dbk:email>
</dbk:address>
</dbk:affiliation>
</dbk:author>
</dbk:info>
<dbk:para>This is a short article.</dbk:para>
</dbk:article>
OAXAL defines (fairly complicated) workflows for document life cycles. From a high-level perspective, the life-cycle for localization-only workflows (e.g. as in sofware localization) looks like this:
The key step is to extract some XLIFF file with translation string that then will be used to translate and merged back.
- Translation Memory eXchange (TMX) “is an open XML standard for the exchange of translation memory data created by computer-aided translation and localization tools. TMX is developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a special interest group of LISA (Localization Industry Standards Association” (Wikipedia, retr. jan 2010).
Software
- PO editors
- See gettext
- Free translation tools
- ForeignDesk (idle project since 2002)
- Free online translation support systems
- Google translate. Note: There is a way to get support for mediawiki files. Upload a filed called *.mediawiki. E.g. edit wikipage and copy/paste source or get it by &action=raw URL Get parameter, etc. Do not whine if some formatting doesn't render well (each mediawiki is different). This is a non-documented feature and we guess that it may become official after some further development - Daniel K. Schneider 17:36, 29 January 2010 (UTC).
- Frameworks
- The Okapi Framework is a set of interface specifications, format definitions, components and applications that provides an environment to build interoperable tools for the different steps of the translation and localization process. (Seems to be a live project).
- Okapi filters
- Components
- downloads (include a localization toolbox and a translation memory editor).
open source localization guidelines
The idea behind writing this entry was to come up with some suggestions/guidelines concerning localization in volunteer-based substantial open source projects.
(under construction - 21:04, 28 January 2010 (UTC) !)
Profit from the fact that translators probably will use the software and that they will stumble on surface usability and cognitive ergonomics issues. I.e. stuff that is hard to translate may be problematic in the source language to start with
Early in the project, start a collaborative glossary/terminology project. The glossary should include all major terms used in the UI or the manuals. Such a project should have the following features for a single meaning term (for polysemic objects this must be expanded):
- Entry term
- Equivalents (but that should not be used in UI and the system messsages)
- Definitions
- Contexts (e.g. for an LMS, that would be types of pedagogical scenarios)
- Sources
A online translation system should be created (or used) and have the following features
- display equivalents of a term in each language and at least be able to concurrently display source + two target languages.
- search all terms in target or source language and then display all the terms again in triplets as above
- display information and (if existing) constant/parameter string used by coders
- be able to annote each term with comments or maybe an semi-automatic link to a glossary where comments could be made (precision neeeded here)
- Tranlated items should be clearly attributed to a single translator, including revisions if possible.
There should be some kind of incentives for the translators. One also might try to organize monetary incentives provided by localized communities (this is about OSS without resources to pay tranlators). E.g. a German fund might pay money to german translators and a French fund might pay a trip to Australia.
User's should be given the opportunity to participate. Windows and Google do it, so can we. I.e. there should button that a user can hit for feedback. Feedback should be very simple, e.g. with radio buttons (but include the possibility to add a message.)
- This is not clear (why)
- This has a typo (what)
- There is a better term (which/why)
- This uses a different language (with respect to what)
- etc.
The big question is where to position such a button and how it could detect what the user is talking about ...
Links
Texts and web sites about software localization
- Definitions
- Organizations
- Localization Industry Standard Association (LISA)
- The Globalization Insider (LISA News)
- Globalization and Localization Association (GALA)
- Technology Development for Indian Languages (TDIL)
- W3CInternationalization (I18n) Activity - [http://www.w3.org/International/its/ig/ Internationalization Tag Set Interest Group
Home Page]
- How-to / tutorials
- Articles, best practices & tutorials. A list provided by the W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!
- Localisation Guide and Document translation at Sourceforge.
- How to Localize Software (Developer-resource.com, retrieved 16:27, 21 January 2010 (UTC)).
- Software Localization versus Translation, TranslatorsCafé.com, by Alexander Schunk. Submitted on March 22, 2008
- for Developing Non-English Web Sites by [http://tlt.psu.edu/ Teaching and Learning with Technology, Penn State University (includes several good web pages, e.g. about HTML "lang" attribute.
- Seven Habits, advice from LISA.
- The Pavel Terminology Tutorial. This is substantial tutorial on all aspects regarding terminology made by the Canadian Translation bureau (make sure to explore all the menus, there are dozens of pages ...)
- XLIFF
- XML in localisation: Use XLIFF to translate documents by Rodolfo M. Rava.
- XLIFF: An Aid To Localization, retrieved 17:36, 29 January 2010 (UTC). (Oracle/Sun Developer Network (SDN).
- About language file formats
- XLIFF: An Aid To Localization by John Corrigan and Tim Foster, Sun Developer Network.
- XLIFF (Wikipedia)
Glossaries and other structured data
- Internationalization
- Dotnet-culture.net provides information about date/time and decimal/currency representation.
- Terminology databases
- Le grand dictionnaire terminologique (english/french)
- The Open Terminology Forum french, english, spanish, chinese (?). Limited to some subdomains / maintained by volonteers. This website might be of interest to some open source localization projects, either for hosting a project or to get some inspiration.
- Termium Plus (The Government of Canada's terminology and linguistic data bank).
- wiktionary.org (The free dictionary, a sister project of wikipedia). Includes translations.
Other
- Example Projects and languages
- Mozilla
- Does Internalization through the XUL User Interface Language
- L10n:Home Page
- Mozilla Localization Project (Archives)
- Courses
- Software Localization MCLS 600012 taught by Gregory M. Shreve. (2001, retrieved 16:27, 21 January 2010 (UTC)). Includes PPT and HTML files for reading. Also available here
- Lecture 1
- What is Software Localization? A list Alexa Dubreuil that summarizes the course syllaus
Software links
- Translate Toolkit (Wikipedia)
- Sun Open Language Tools (XLIFF system)
- More: Trados® Freelance™, Atril Déjà Vu, STAR Transit, SDLX™, IBM TranslationManager,
- Indexes
- Translators’ On-Line Resources, hosted by the Translation Journal.
- Free and Open Source Software for Translators, an index page made by Corinne McKay (Translator)
Bibliography
- Dohler, Per N. (1979). Facets of Software Localization, A Translator's View. Translation Journal 1, July 1997. (retrieved 16:27, 21 January 2010 (UTC))
- McKethan, Kenneth A. (Sandy)Jr. and Graciela White (2005). Demystifying Software Globalization, Translation Journal 9 (2), April 2005. HTML, retrieved 16:27, 21 January 2010 (UTC).
- Esselink Bert (2000), A Practical Guide to Localization, , John Benjamins Publishing, ISBN 1-58811-006-0
- Karunakar, G. and Shirvastava, R. (2007) Evolving Translations & Terminology - The Open Way, LRIL-2007 - National Seminar on Creation of Lexical Resources for Indian Language Computing and Processing at C-DAC Mumbai. PDF