blog

Ubiquity Localization Update

As we move closer and closer to shipping a Ubiquity with there is still much work to be done, particularly in the area of localization. In a recent Ubiquity meeting we laid out the explicit localization goals and non-goals of as follows:

  • Goals for 0.5
    • Parser 2 (on by default)
    • underlying support for localization of commands
    • localization of standard feed commands for a few languages
    • Parser 2 language files for those same languages
  • Nongoals for 0.5
    • distribution/sharing of localizations
    • localization of nountypes

The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source1 this process.

Localization Architecture

The localization of Ubiquity commands will use a gettext-style approach where localization files list key-value pairs for different properties and messages of the commands. For Ubiquity 0.5, where we only deal with the standard command feeds bundled Ubiquity, we can simply place all the localization files in ubiquity/standard-feeds/localization. Localization files are organized by source feed, with one localization file per source feed, per language.

The localizable components of commands will include the names, contributors, and help properties, as well as any localizable strings in the command’s preview() and execute() methods. To make strings localizable in preview() and execute(), they must be wrapped in the localize function, _().2

Other localizable components, like names, contributors, and help will not need to be wrapped in the _() function. In addition, as the localization files can only hold values of strings, for values such as names and contributors, the delimiter | can be used to delimit multiple values.

zoom.names=ズーム|ズームして|ズームする|ズームしろ

The Localization Experience

One tool we have planned to help kickstart the localization process is a tool that will automatically create a template of strings that need localization in a user’s commands. I took a first stab at this tool today. Clicking on the “get localization template” link next to each feed in the Ubiquity command list will give you a template which you can then copy into a text file:

localization-template-smaller.png

Additionally, instructions will later be added to this page to specify how and where to save localizations to test them or perhaps we can add a button that will automatically save it in the right location.

Open Questions

Localization file formats

There are two kinds of file formats for localizations we are considering: .properties and .po, the native gettext format. As an example, here is the same key-value pair in the two formats:

.properties:

# This is a comment
welcomeMessage=Hello, world!

.po:

#. This is a comment (the . is actually optional)
msgid "welcomeMessage"
msgstr "Hello, world!"

The advantage of .properties over .po is that Mozilla natively supports this format with an XUL/XPCOM interface called stringbundle and it is what is used to localize JavaScript in Firefox itself. We actually already have the _() localization function working with the properties file format, following gomita’s great instructions (Japanese) on how to load properties files in using Mozilla’s native stringbundle tools via JavaScript.

The advantage of .po over .properties is that it is the de-facto standard in localization, particularly in the UNIX world. Lots of great tools have been written for it. The adoption of .po could make Ubiquity localization more accessible for more people. Another advantage is that .po files can have keys with spaces, as I note below.

If we do opt to work with .po files, the two libraries I see out in the wild for dealing with .po files are gettext-js (MIT) and jsgettext (LGPL). While I haven’t looked at the libraries in depth yet, so far jsgettext seems to be the winner, as some sections of gettext-js require the use of the prototype.js library.

A “key” question

icanhaskeyplz.jpg

In either file format, we need a unique way to refer to each localizable string—a key format. As each localization file refers to a command feed, the first collision we must avoid is the command name. With this in mind, we can come up with some trivial keys for the localizable properties: (here, consider the command hello)

  • hello.names
  • hello.contributors
  • hello.help

However, we run into difficulty when we try to come up with keys for the arbitrary text in previews and executes. For example, for a message like “Hello world!” in the preview, we could simply make the key hello.preview.Hello world! but this may be unruly and be prone to typos. In addition, in .properties files keys cannot have certain characters in them, like spaces, so we would have to make the key something like hello.preview.Hello_world! or, stripping symbols and standardizing case, hello.preview.HELLO_WORLD.

Keys could also get very long with this type of key format, although here again .po files may have an advantage as they can stay relatively more legible even with long keys. One option to deal with this would be to optionally supply a key argument to _() so that it is used instead of the automatic key. For example, suppose the hello command’s preview() included this code:

_('This is a really long greeting message. Hello there!','longmessage')

then a localizer would only have to refer to hello.preview.longmessage, not hello.preview.THIS_IS_A_REALLY_LONG_GREETING_MESSAGE_HELLO_THERE.

satyr points out that some commands use another function to incorporate similar actions and messages in both preview() and execute(). In this case, he argues, it wouldn’t make sense to have to keep both localizations (hello.preview.… and hello.execute.…). He suggests that optional keys (mentioned above) could be used without the preview. or execute. infixes, as in hello.longmessage. By taking out the preview and execute namespacing in the localization keys, though, it becomes the command author’s responsibility to not accidentally use strings named “names”, “help”, etc. that will have unintended consequences.

Conclusion

I hope that this blog post gives people an idea of the progress we’ve made in the localization area and gets people thinking about the challenges we still face. We’d love to get your feedback on the localization format and process in Ubiquity, as well as the open problems of the file format and keys.


  1. Or “cloud-source”… finally a Japanese accent joke that’s semantically stable! 

  2. This function currently also has the ability to do simple printf-formatted string replacements:

    _('This is a %S.',['test'])

    Whether this format will replace support for CmdUtils.renderTemplate remains to be seen and is definitely worthy of discussion. If we move away from properties files, in particular, we may keep renderTemplate() in lieu of the printf format. Mozilla’s built-in stringbundle handling just gave us a fast and free implementation of printf-style replacement. 

Related posts:

  1. Localizing Commands for Ubiquity 0.5
  2. A Visual Guide to Community Command Localization
  3. Ubiquity Localization: What’s New, What’s Next
  4. Big Issues and Small Issues with Parser 2
  5. Localizing Ubiquity: commands and nountypes

Related posts brought to you by Yet Another Related Posts Plugin.

Tags: , , , , , , , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed (optionally with tweets from my Twitter)!

17 Responses to “Ubiquity Localization Update”

  1. toniher's status on Friday, 12-Jun-09 10:57:56 UTC - Identi.ca Says:

    […] Ubiquity localisation update: http://mitcho.com/blog/projects/ubiquity-localization-update/ […]

  2. Blair McBride Says:

    I'm torn between the two options here. I prefer the syntax of .properties (and the built-in tools), but like that .po works such that you can do:

    msgid "Hello, World!" msgstr "Hallo, Welt!"

    Where the ID is just the English version of the string. No need to decide on some normalized ID that conforms to arbitrary rule. When translating, you get the string you're translating from without needing to look up the source code, try the command, or have an additional comment (which I tihnk is usually what happens in Mozilla-land). And as an upside, if the original text changes and needs re-translated, then you get the ID change for free.

  3. Stas Malolepszy Says:

    gettext files need to compiled before you can use them. Would you need to recompile them after every feed update?

  4. marsf Says:

    Mozilla localizers are familiar with ".properties" format. If using ".properties", there is "English.properties" file and this makes easy to start localize. I think that it isn't needed any spaces in the key.

    For external command feeds, define them within the command like this? CmdUtils.CreateCommand({ … messages: { execute.hello: [en: "Hello!", ja:"こんにちは!"], preview.hello: [en: "Hello!", ja:"こんにちは!"] } });

  5. mitcho Says:

    Stas, if we go with gettext (po) files, we'd be reading them in using JavaScript, and neither of those public libraries for JS/gettext only support po, not mo. If we just use this as is, we won't need to deal with the compiling issue.

  6. mitcho Says:

    Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command,

    <code>_('helloMessage')</code>

    and then

    <code>hello.preview.helloMessage=Hello, world!</code>

    Is that what you mean?

  7. mitcho Says:

    Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command,

    _('helloMessage')

    and then

    hello.preview.helloMessage=Hello, world!

    Is that what you mean?

  8. marsf Says:

    > Is that what you mean?

    Yes. (except for verb names.)

  9. Julen Says:

    If you choose PO over properties the template thing is straightforward.

    The ID changes are also handled better in the gettext format and if you think a msgid (source text) is ambiguous, you can use the msgctxt field to specify the context and disambiguate it.

    Also please consider reading this about monolingual file formats: http://translate.sourceforge.net/wiki/guide/monolingual

  10. Axel Hecht Says:

    We shouldn't use po in its monolingual variant, that's just gonna make folks puke.

    As for everything else, Gandalf and I are currently working on a json variant for l20n, which I owe .platform a post about. Not sure how well that's going to go, though.

    I'm not sure how well re-implementing a po parser in js will go, it's usually easy to make the common things work, but if you're facing the output of a dozen different tools, you might have made faulty assumptions.

  11. mitcho Says:

    Hi Axel,

    Thanks for the comment. I've actually been playing around with jsgettext (http://jsgettext.berlios.de/) today and have been quite happy with it so far… It implements its own po parser but internally works with it as json. It even supports a json file format in addition to po, though I don't know how similar it is to what you and Gandalf have been working on.

    We're hoping to get something workable out the door in the next week or so, though, so I think it's looking like po is the winner right now.

  12. Tesco voucher codes Says:

    Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command?

  13. coupons Says:

    The ID changes are also handled better in the gettext format and if you think a msgid (source text) is ambiguous, you can use the msgctxt field to specify the context and disambiguate it.

  14. Gamre Says:

    Awesome picture, Haha

  15. Essay writing Says:

    The overall goal for this release of Ubiquity is to come up with a format and standard for localization. It is a great goal I should say.

  16. neil Says:

    I think you'll do find with the .po version. As I understand, it is far superior for using with mulitlingual programs, and Japanese/english should do well.

  17. Employment Says:

    Would you need to recompile them after every feed update?


© 2006-2010 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress.
Entries (RSS) and Comments (RSS).
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.