Ubiquity Localization Update

Jun 12, 2009

As we move closer and closer to shipping a Ubiquity with there is still much work to be done, particularly in the area of localization. In a recent Ubiquity meeting we laid out the explicit localization goals and non-goals of as follows:

Goals for 0.5
- Parser 2 (on by default)
- underlying support for localization of commands
- localization of standard feed commands for a few languages
- Parser 2 language files for those same languages
Nongoals for 0.5
- distribution/sharing of localizations
- localization of nountypes

The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source¹ this process.

Localization Architecture

The localization of Ubiquity commands will use a [[gettext]]-style approach where localization files list key-value pairs for different properties and messages of the commands. For Ubiquity 0.5, where we only deal with the standard command feeds bundled Ubiquity, we can simply place all the localization files in ubiquity/standard-feeds/localization. Localization files are organized by source feed, with one localization file per source feed, per language.

The localizable components of commands will include the names, contributors, and help properties, as well as any localizable strings in the command’s preview() and execute() methods. To make strings localizable in preview() and execute(), they must be wrapped in the localize function, _().²

Other localizable components, like names, contributors, and help will not need to be wrapped in the _() function. In addition, as the localization files can only hold values of strings, for values such as names and contributors, the delimiter | can be used to delimit multiple values.

zoom.names=ズーム|ズームして|ズームする|ズームしろ

The Localization Experience

One tool we have planned to help kickstart the localization process is a tool that will automatically create a template of strings that need localization in a user’s commands. I took a first stab at this tool today. Clicking on the “get localization template” link next to each feed in the Ubiquity command list will give you a template which you can then copy into a text file:

Additionally, instructions will later be added to this page to specify how and where to save localizations to test them or perhaps we can add a button that will automatically save it in the right location.

Open Questions

Localization file formats

There are two kinds of file formats for localizations we are considering: .properties and .po, the native [[gettext]] format. As an example, here is the same key-value pair in the two formats:

`.properties`:

# This is a comment
welcomeMessage=Hello, world!

`.po`:

#. This is a comment (the . is actually optional)
msgid "welcomeMessage"
msgstr "Hello, world!"

The advantage of .properties over .po is that Mozilla natively supports this format with an XUL/XPCOM interface called stringbundle and it is what is used to localize JavaScript in Firefox itself. We actually already have the _() localization function working with the properties file format, following gomita’s great instructions (Japanese) on how to load properties files in using Mozilla’s native stringbundle tools via JavaScript.

The advantage of .po over .properties is that it is the de-facto standard in localization, particularly in the UNIX world. Lots of great tools have been written for it. The adoption of .po could make Ubiquity localization more accessible for more people. Another advantage is that .po files can have keys with spaces, as I note below.

If we do opt to work with .po files, the two libraries I see out in the wild for dealing with .po files are gettext-js (MIT) and jsgettext (LGPL). While I haven’t looked at the libraries in depth yet, so far jsgettext seems to be the winner, as some sections of gettext-js require the use of the prototype.js library.

A “key” question

In either file format, we need a unique way to refer to each localizable string—a key format. As each localization file refers to a command feed, the first collision we must avoid is the command name. With this in mind, we can come up with some trivial keys for the localizable properties: (here, consider the command hello)

hello.names
hello.contributors
hello.help

However, we run into difficulty when we try to come up with keys for the arbitrary text in previews and executes. For example, for a message like “Hello world!” in the preview, we could simply make the key hello.preview.Hello world! but this may be unruly and be prone to typos. In addition, in .properties files keys cannot have certain characters in them, like spaces, so we would have to make the key something like hello.preview.Hello_world! or, stripping symbols and standardizing case, hello.preview.HELLO_WORLD.

Keys could also get very long with this type of key format, although here again .po files may have an advantage as they can stay relatively more legible even with long keys. One option to deal with this would be to optionally supply a key argument to _() so that it is used instead of the automatic key. For example, suppose the hello command’s preview() included this code:

_('This is a really long greeting message. Hello there!','longmessage')

then a localizer would only have to refer to hello.preview.longmessage, not hello.preview.THIS_IS_A_REALLY_LONG_GREETING_MESSAGE_HELLO_THERE.

satyr points out that some commands use another function to incorporate similar actions and messages in both preview() and execute(). In this case, he argues, it wouldn’t make sense to have to keep both localizations (hello.preview.… and hello.execute.…). He suggests that optional keys (mentioned above) could be used without the preview. or execute. infixes, as in hello.longmessage. By taking out the preview and execute namespacing in the localization keys, though, it becomes the command author’s responsibility to not accidentally use strings named “names”, “help”, etc. that will have unintended consequences.

Conclusion

I hope that this blog post gives people an idea of the progress we’ve made in the localization area and gets people thinking about the challenges we still face. We’d love to get your feedback on the localization format and process in Ubiquity, as well as the open problems of the file format and keys.

Or “cloud-source”… finally a Japanese accent joke that’s semantically stable! ↩
This function currently also has the ability to do simple [[printf]]-formatted string replacements:
```
_('This is a %S.',['test'])
```
Whether this format will replace support for CmdUtils.renderTemplate remains to be seen and is definitely worthy of discussion. If we move away from properties files, in particular, we may keep renderTemplate() in lieu of the [[printf]] format. Mozilla’s built-in stringbundle handling just gave us a fast and free implementation of [[printf]]-style replacement. ↩