Ubiquity Localization Update
As we move closer and closer to shipping a Ubiquity with there is still much work to be done, particularly in the area of localization. In a recent Ubiquity meeting we laid out the explicit localization goals and non-goals of as follows:
- Goals for 0.5
- Parser 2 (on by default)
- underlying support for localization of commands
- localization of standard feed commands for a few languages
- Parser 2 language files for those same languages
- Nongoals for 0.5
- distribution/sharing of localizations
- localization of nountypes
The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source1 this process.
Localization Architecture
The localization of Ubiquity commands will use a gettext-style approach where localization files list key-value pairs for different properties and messages of the commands. For Ubiquity 0.5, where we only deal with the standard command feeds bundled Ubiquity, we can simply place all the localization files in ubiquity/standard-feeds/localization. Localization files are organized by source feed, with one localization file per source feed, per language.
The localizable components of commands will include the names, contributors, and help properties, as well as any localizable strings in the command’s preview() and execute() methods. To make strings localizable in preview() and execute(), they must be wrapped in the localize function, _().2
Other localizable components, like names, contributors, and help will not need to be wrapped in the _() function. In addition, as the localization files can only hold values of strings, for values such as names and contributors, the delimiter | can be used to delimit multiple values.
zoom.names=ズーム|ズームして|ズームする|ズームしろ
The Localization Experience
One tool we have planned to help kickstart the localization process is a tool that will automatically create a template of strings that need localization in a user’s commands. I took a first stab at this tool today. Clicking on the “get localization template” link next to each feed in the Ubiquity command list will give you a template which you can then copy into a text file:
Additionally, instructions will later be added to this page to specify how and where to save localizations to test them or perhaps we can add a button that will automatically save it in the right location.
Open Questions
Localization file formats
There are two kinds of file formats for localizations we are considering: .properties and .po, the native gettext format. As an example, here is the same key-value pair in the two formats:
.properties:
# This is a comment welcomeMessage=Hello, world!
.po:
#. This is a comment (the . is actually optional) msgid "welcomeMessage" msgstr "Hello, world!"
The advantage of .properties over .po is that Mozilla natively supports this format with an XUL/XPCOM interface called stringbundle and it is what is used to localize JavaScript in Firefox itself. We actually already have the _() localization function working with the properties file format, following gomita’s great instructions (Japanese) on how to load properties files in using Mozilla’s native stringbundle tools via JavaScript.
The advantage of .po over .properties is that it is the de-facto standard in localization, particularly in the UNIX world. Lots of great tools have been written for it. The adoption of .po could make Ubiquity localization more accessible for more people. Another advantage is that .po files can have keys with spaces, as I note below.
If we do opt to work with .po files, the two libraries I see out in the wild for dealing with .po files are gettext-js (MIT) and jsgettext (LGPL). While I haven’t looked at the libraries in depth yet, so far jsgettext seems to be the winner, as some sections of gettext-js require the use of the prototype.js library.
A “key” question

In either file format, we need a unique way to refer to each localizable string—a key format. As each localization file refers to a command feed, the first collision we must avoid is the command name. With this in mind, we can come up with some trivial keys for the localizable properties: (here, consider the command hello)
hello.nameshello.contributorshello.help
However, we run into difficulty when we try to come up with keys for the arbitrary text in previews and executes. For example, for a message like “Hello world!” in the preview, we could simply make the key hello.preview.Hello world! but this may be unruly and be prone to typos. In addition, in .properties files keys cannot have certain characters in them, like spaces, so we would have to make the key something like hello.preview.Hello_world! or, stripping symbols and standardizing case, hello.preview.HELLO_WORLD.
Keys could also get very long with this type of key format, although here again .po files may have an advantage as they can stay relatively more legible even with long keys. One option to deal with this would be to optionally supply a key argument to _() so that it is used instead of the automatic key. For example, suppose the hello command’s preview() included this code:
_('This is a really long greeting message. Hello there!','longmessage')
then a localizer would only have to refer to hello.preview.longmessage, not hello.preview.THIS_IS_A_REALLY_LONG_GREETING_MESSAGE_HELLO_THERE.
satyr points out that some commands use another function to incorporate similar actions and messages in both preview() and execute(). In this case, he argues, it wouldn’t make sense to have to keep both localizations (hello.preview.… and hello.execute.…). He suggests that optional keys (mentioned above) could be used without the preview. or execute. infixes, as in hello.longmessage. By taking out the preview and execute namespacing in the localization keys, though, it becomes the command author’s responsibility to not accidentally use strings named “names”, “help”, etc. that will have unintended consequences.
Conclusion
I hope that this blog post gives people an idea of the progress we’ve made in the localization area and gets people thinking about the challenges we still face. We’d love to get your feedback on the localization format and process in Ubiquity, as well as the open problems of the file format and keys.
-
Or “cloud-source”… finally a Japanese accent joke that’s semantically stable! ↩
-
This function currently also has the ability to do simple printf-formatted string replacements:
_('This is a %S.',['test'])
Whether this format will replace support forCmdUtils.renderTemplateremains to be seen and is definitely worthy of discussion. If we move away from properties files, in particular, we may keeprenderTemplate()in lieu of the printf format. Mozilla’s built-in stringbundle handling just gave us a fast and free implementation of printf-style replacement. ↩
Related posts:
- Localizing Commands for Ubiquity 0.5
- Ubiquity Localization: What’s New, What’s Next
- A Visual Guide to Community Command Localization
- Big Issues and Small Issues with Parser 2
- Localizing Ubiquity: commands and nountypes
Related posts brought to you by Yet Another Related Posts Plugin.
Tags: code, commands, gettext, i18n, internationalization, javascript, l10n, language, localization, Mozilla Planet, ubiquity
If you enjoyed this post, make sure you subscribe to my RSS feed (optionally with tweets from my Twitter)!

June 12th, 2009 at 10:57 am
[…] Ubiquity localisation update: http://mitcho.com/blog/projects/ubiquity-localization-update/ […]
June 12th, 2009 at 11:36 am
I'm torn between the two options here. I prefer the syntax of .properties (and the built-in tools), but like that .po works such that you can do:
msgid "Hello, World!" msgstr "Hallo, Welt!"
Where the ID is just the English version of the string. No need to decide on some normalized ID that conforms to arbitrary rule. When translating, you get the string you're translating from without needing to look up the source code, try the command, or have an additional comment (which I tihnk is usually what happens in Mozilla-land). And as an upside, if the original text changes and needs re-translated, then you get the ID change for free.
June 12th, 2009 at 11:06 pm
gettext files need to compiled before you can use them. Would you need to recompile them after every feed update?
June 13th, 2009 at 1:39 am
Mozilla localizers are familiar with ".properties" format. If using ".properties", there is "English.properties" file and this makes easy to start localize. I think that it isn't needed any spaces in the key.
For external command feeds, define them within the command like this? CmdUtils.CreateCommand({ … messages: { execute.hello: [en: "Hello!", ja:"こんにちは!"], preview.hello: [en: "Hello!", ja:"こんにちは!"] } });
June 13th, 2009 at 1:42 am
Stas, if we go with gettext (po) files, we'd be reading them in using JavaScript, and neither of those public libraries for JS/gettext only support po, not mo. If we just use this as is, we won't need to deal with the compiling issue.
June 13th, 2009 at 1:44 am
Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command,
<code>_('helloMessage')</code>
and then
<code>hello.preview.helloMessage=Hello, world!</code>
Is that what you mean?
June 13th, 2009 at 1:44 am
Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command,
_('helloMessage')
and then
hello.preview.helloMessage=Hello, world!
Is that what you mean?
June 13th, 2009 at 7:18 am
> Is that what you mean?
Yes. (except for verb names.)
June 14th, 2009 at 10:25 am
If you choose PO over properties the template thing is straightforward.
The ID changes are also handled better in the gettext format and if you think a msgid (source text) is ambiguous, you can use the msgctxt field to specify the context and disambiguate it.
Also please consider reading this about monolingual file formats: http://translate.sourceforge.net/wiki/guide/monolingual
June 15th, 2009 at 8:06 am
We shouldn't use po in its monolingual variant, that's just gonna make folks puke.
As for everything else, Gandalf and I are currently working on a json variant for l20n, which I owe .platform a post about. Not sure how well that's going to go, though.
I'm not sure how well re-implementing a po parser in js will go, it's usually easy to make the common things work, but if you're facing the output of a dozen different tools, you might have made faulty assumptions.
June 15th, 2009 at 8:14 am
Hi Axel,
Thanks for the comment. I've actually been playing around with jsgettext (http://jsgettext.berlios.de/) today and have been quite happy with it so far… It implements its own po parser but internally works with it as json. It even supports a json file format in addition to po, though I don't know how similar it is to what you and Gandalf have been working on.
We're hoping to get something workable out the door in the next week or so, though, so I think it's looking like po is the winner right now.
September 23rd, 2009 at 3:08 am
Do you mean we should make all localizable strings use keys in the code and put all localizable content in the localization files? So in the command?
September 23rd, 2009 at 3:10 am
The ID changes are also handled better in the gettext format and if you think a msgid (source text) is ambiguous, you can use the msgctxt field to specify the context and disambiguate it.
October 6th, 2009 at 8:32 pm
Awesome picture, Haha
October 14th, 2009 at 12:50 pm
The overall goal for this release of Ubiquity is to come up with a format and standard for localization. It is a great goal I should say.
October 21st, 2009 at 6:21 pm
I think you'll do find with the .po version. As I understand, it is far superior for using with mulitlingual programs, and Japanese/english should do well.
November 12th, 2009 at 6:37 am
Would you need to recompile them after every feed update?