mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.

blog

Posts Tagged ‘l10n’

The Aliens Aliases Have Landed

Friday, September 4th, 2009

close-encounters.jpg

This week I implemented a new way to customize and extend Ubiquity commands: CmdUtils.CreateAlias.

The use case for and importance of CreateAlias

CreateAlias lets you easily create a special-case alias of another, more generic verb. Ubiquity comes bundled with useful verbs like translate and search which can be used for a number of different uses based on their arguments. In some cases, and in some languages, though, typing out translate to English or search with Google is unnatural, though, as there is a more succinct and direct way to make that request. For example, in English one could say “anglicize” or “google”, respectively, for the verbs and arguments above. Indeed, in order to support both search with Google and google, Ubiquity traditionally has implemented two different verbs, search and google, which duplicate functionality and code.

CreateAlias lets us create such natural aliases [[Don’t_repeat_yourself|without repeating ourselves]]. We can easily create an anglicize verb which, in one word, does the work of translate to English, or google which is semantically equivalent to search with Google.

These sorts of aliases become particularly important in our perpetual quest to internationalize Ubiquity. One discussion that came up early on on our Ubiquity-i18n list is the fact that not all languages have the verb “Google”: in many languages it is necessary to explicitly say “search with Google”. Moreover, other languages may have other domain-specific verbs which English doesn’t have either. Maybe some language has a special verb for “email with Hotmail” or “map Denmark”. Who knows? With CreateAlias we can easily enable such localizations based on the more generic commands bundled with Ubiquity.

Creating an alias

CreateAlias was designed to be incredibly simple to use. Here’s an example that will be bundled (but not installed by default) in Ubiquity:

CmdUtils.CreateAlias({
  names: ["anglicize"],
  verb: "translate",
  givenArgs: { goal: "English" }
});

As you see, this syntax is incredibly straightforward. There are two required properties, names, an array of names for the alias, and verb, a reference to the target verb that this alias should use.1

The alias can also have a givenArgs property which is a hash of pre-specified arguments with their semantic roles. Because translate takes three arguments (an object text, a goal language, and a source language) but we have pre-specified a goal in the givenArgs, the new anglicize command will only take two arguments: the object text and a source language. Of course, if you specify no givenArgs, you’ll get a simple synonym without having access to the original verb’s code.

anglicize.png

As you see, the preview of this command is simply the preview of the translate verb. Its preview and execution is just as if you had entered translate こんにちは to English.

Just like other commands created with CreateCommand, the object specifying the alias can also have properties like help, description, author information, and so on. I used the icon property to add a [[Union Jack]] to it so that it was easily identifiable.

Bonus: using CmdUtils.previewCommand and CmdUtils.executeCommand

On the road to implementing CreateAlias, I also implemented the CmdUtils.previewCommand and CmdUtils.executeCommand functions. The majority of this code comes from previous work by Louis-Rémi Babé, though I adapted it to the modern Ubiquity system. Using previewCommand and executeCommand you can take advantage of the preview or execute functionality of another command. In the new alias-commands feed I included a command called germanize which essentially is a straightforward analogy to anglicize, seen above, but using these functions within a CreateCommand. While CreateAlias is much more straightforward for simple aliases, for more complex subcommands where you would like to adapt another verb’s execution or preview, or only take one of those but re-implement the other part, previewCommand and executeCommand are the way to go.


  1. The verb reference can be the canonical or reference name of a command, which is the first name in the names of a command (also the name listed in the command list when Ubiquity is running in English) or the actual internal ID of the command, which looks like resource://ubiquity/standard-feeds/general.html#translate

Let’s talk about how cool our localizers are

Tuesday, August 11th, 2009

I uploaded Ubiquity to BabelZilla, an online community and tool for localizing Mozilla-style strings, just a couple days ago and we already have French and Polish complete.1 WOW!

babelzilla-status.png

Granted, these are only Ubiquity’s interface strings (for example, the about and settings pages)… the parser localization and command localization have their own processes.2 But this is still a tremendous accomplishment!

Hopefully we can roll some of these complete or almost-complete interface localizations with Ubiquity 0.5.4 which is a minor bugfix update coming soon. If you would like to get involved with localizing the Ubiquity interface strings into your language, get a BabelZilla login and sign up on the Ubiquity project page. Thanks again to our rockin’ localizers!


  1. I received notification that the Polish localization in particular has completed testing and is now ready for release, as I was writing this blog post

  2. Perhaps this anecdote is telling us that having a nice centralized web interface for localizers to work together and without messing with the files directly is a plus. Perhaps we should put up the builtin commands for localization on something like [[Pootle]] or Launchpad. Thoughts, anyone? 

A Visual Guide to Community Command Localization

Monday, July 13th, 2009

A natural language interface is only “natural” if it’s in your natural language. With this mantra in mind, we’ve been making steady progress on the challenging problem of Ubiquity localization. The first fruit of this research is in the localization of the parser and bundled commands in Ubiquity 0.5. Here today is a visual guide on command localization in Ubiquity and different options we can take in attacking the community command localization problem. (more…)

Ubiquity Localization: What’s New, What’s Next

Thursday, July 9th, 2009

Yesterday we released Ubiquity 0.5, a major update to the already popular Ubiquity platform. Among numerous other features, Ubiquity 0.5 includes the first fruit of months of research on building a multilingual parser and natural language interface. In this blog post I’ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well as a quick roadmap of future considerations.

Of course, one of the best ways to learn about the new features is to experience them… try Ubiquity 0.5 now!

Install now!

(more…)

Localizing Commands for Ubiquity 0.5

Thursday, June 25th, 2009

As many of you know, earlier this week we released a preview of version 0.5 (0.5pre). We’re going to stress test and refine this release through the weekend and push the official 0.5 out next Tuesday. This release will have fully localized commands for Danish and Japanese, as well as parser settings for a number of other languages. Read this Labs blog post to learn more about the 0.5 release and how to test it.

It’s not too late to add localizations for other languages to 0.5, though. Localizations help make Ubiquity more “natural” for more users, offering a new level of ease and familiarity to the already powerful Ubiquity. We have a new tutorial to help you localize commands.

To help encourage command localization, we now have [[gettext]]-style po template files for all the bundled command feeds in the hg repository. You can find these files in the ubiquity/localization/templates directory of the repository, or on our online hg repository.

If you complete some localizations (even incomplete) for your language and would like to submit them into the repository, for the time being, you can post them on this trac ticket.1

I’ll be looking forward to seeing your localizations! If you have any questions, feel free to ask on the ubiquity-i18n Google group or on irc.mozilla.org#ubiquity. ^^


  1. In the post-0.5 future we’ll be rethinking how best to organize these localization files and give commit access to as many localizers as possible. 

Ubiquity Localization Update

Friday, June 12th, 2009

As we move closer and closer to shipping a Ubiquity with there is still much work to be done, particularly in the area of localization. In a recent Ubiquity meeting we laid out the explicit localization goals and non-goals of as follows:

  • Goals for 0.5
    • Parser 2 (on by default)
    • underlying support for localization of commands
    • localization of standard feed commands for a few languages
    • Parser 2 language files for those same languages
  • Nongoals for 0.5
    • distribution/sharing of localizations
    • localization of nountypes

The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source1 this process.

(more…)


  1. Or “cloud-source”… finally a Japanese accent joke that’s semantically stable! 

Localizing Ubiquity: commands and nountypes

Monday, May 25th, 2009

Now that Parser 2 is in decent shape and a number of parsing problems in different languages have been tackled, the focus has now shifted to coming up with an approach for localizing Ubiquity commands and nountypes. At last week’s weekly Ubiquity meeting we had a great conversation on this subject, which then has continued on the Google group.

I’ve been framing this problem as two subproblems:

  1. What will be the data structure of localized commands/nountypes within Ubiquity?
  2. How do we distribute/share these localizations?

We’ve mostly been discussing the first problem, weighing the merits of unified objects (with different localized text as different JS properties) as opposed to a [[gettext]]-style approach, and noting that our requirements for commands and nountypes may be different. I hope we can discuss the second issue more in the coming week.

Should everything go through the command author? Should localization be centralized through some web tool? Should it be completely distributed like commands currently are? I invite you to join us in this conversation on the Google group. ^^

Big Issues and Small Issues with Parser 2

Wednesday, May 20th, 2009

Jono and I had a good conversation this morning on IRC about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for Ubiquity. Here are our Top 4 Big Issues:

  1. Some commands’ preview’s and execute’s are not working properly (trac #652). This could be an underlying issue with some pipes not rerouted correctly in Parser 2, or it could be that the commands have not been rewritten correctly to take advantage of Parser 2.
  2. Flesh out how to localize resources, like commands and nountypes. We started a conversation on this subject a few weeks ago but we never reached a resolution. This blocks issues 3 and 4 below.
  3. We need to standardize a format for commands for Parser 2. As noted in last week’s meeting (among other places) Parser 2 will require at least some modification to all commands. Jono and I came up with a simple hybrid format for commands which specify takes and modifiers for Parser 1 and arguments for Parser 2, but until we figure out how exactly the localization of commands will work, we can’t write a definitive standard.
  4. Enable nountype localization. While the most popular nountypes used are those that ship with Ubiquity, it is important to come up with a localization process which can apply to custom nountypes as well. Nountype localizations need the ability to either (1) replace the _name only, or (2) replace both the _name and the suggest() logic, as both cases will be necessary.

Given that Big Issue 3 and Big Issue 4 are both dependent on Big Issue 2, there clearly needs to be a continued public discussion of how we should make these resources localizable. I look forward to this discussion taking place at tomorrow’s joint (general + i18n) Ubiquity meeting.

In other news, here are some Small Issues:

  1. Add a switch for parser version and language settings: Jono’s already made a space for this in the new “Settings and Skins” page in about:ubiquity. He’s on it. Like a bonnet.
  2. Magic word (anaphor) substitution is not yet working properly. This needs to work both when there is an explicit magic word and when there are simply missing arguments.
  3. The position of suggested verbs is always sentence-initial (trac #655). This also requires that we can specify whether verb name localizations are sentence-initial forms or sentence-final forms.1

Let’s hit the code!


  1. German, Dutch, and Greek, for example, are all languages where there are both command verb forms which are sentence-initial and sentence-final. 

Adding Your Language to Ubiquity Parser 2

Wednesday, April 29th, 2009

NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions.

You’ve seen the video. You speak another language. And you’re wondering, “how hard is it to add my language to Ubiquity with Parser 2?” The answer: not that hard. With a little bit of JavaScript and knowledge of and interest in your own language, you’ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please submit your (even incomplete) language files!

As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the Ubiquity Planet and/or this blog (RSS).

(more…)

Foxkeh demos Ubiquity Parser: The Next Generation

Wednesday, April 1st, 2009

I just made a screencast with Foxkeh to demo the Ubiquity next generation parser demo and to demonstrate how easy it is to add your own language. Foxkeh wants you to localize the parser into your language. How could you say no? ^^


Foxkeh demos Ubiquity Parser: The Next Generation from mitcho on Vimeo.

There are some details which are not covered in this introductory video, such as how to deal with case marking languages or languages without spaces. Hopefully this’ll inspire some people to play with the demo, though. I’d love to hear your comments! ^^

This week on Ubiquity Parser: The Next Generation

Friday, March 27th, 2009

parsertng.png

Last week I released a proof-of-concept demo of the next generation Ubiquity parser design and it was also the focus of discussion in our weekly internationalization meeting.1 Christian Sonne even wrote a Danish plugin for it during the meeting—a testament to the pluggability and of the new parser design.

In addition, at the Ubiquity weekly meeting, pushing this new parser into Ubiquity proper was identified as a key goal of Ubiquity 0.2, making frequent iteration and debate over this parser essential.

To that end, I’ll highlight some of the changes made to the parser demo codebase in the past week: (more…)


  1. The weekly internationalization meeting, like all Ubiquity weekly meetings, are completely open to the public. We’d love to hear new voices contribute to the discussion! Take a look at the schedule of upcoming meetings

Ubiquity in Portuguese

Thursday, March 5th, 2009

Felipe, a Ubiquity user, put together a wonderful look at what Ubiquity might look like in Portuguese. He has some great points here particularly regarding the “map” verb used in English—Felipe points out that Portuguese does not have a very common “map” verb and that it would be much more common to use enter me dê (literally me give) to use a verb to request a map. This is a great example of how Jono’s overlord verbs proposal may be an important aspect of our i18n efforts. The post is also timely as we’ve recently been discussing in our regular meetings (open to all!) that Portuguese may/could be the focus of our next parser construction efforts.

What would the challenges be for Ubiquity in your language? We’d love to see an increasing number of blog posts on this topic in different languages. Thanks Felipe! ^^