A natural language interface is only “natural” if it’s in your natural language. With this mantra in mind, we’ve been making steady progress on the challenging problem of Ubiquity localization. The first fruit of this research is in the localization of the parser and bundled commands in Ubiquity 0.5. Here today is a visual guide on command localization in Ubiquity and different options we can take in attacking the community command localization problem. (more…)
I have been accepted to present a short paper entitled “Ubiquity: Designing a Multilingual Natural Language Interface” at the ACM SIGIR Workshop on Information Access in a Multilingual World in Boston on July 23rd. I’ll probably be there in Boston a few days before or after as well in order to find an apartment for the fall. If anyone is in Boston at that time and would like to meet up, or if you’re near Cambridge and looking for an apartment-mate, please let me know.
If you would like to see a preprint of the paper, please contact me at x@x.com where x=mitcho.
As Ubiquity 0.5 will be released soon (Thursday morning in Mountain View), I decided it was a good time to put together a screencast in Japanese demoing the use of the new Japanese parser and commands.
As many of you know, earlier this week we released a preview of version 0.5 (0.5pre). We’re going to stress test and refine this release through the weekend and push the official 0.5 out next Tuesday. This release will have fully localized commands for Danish and Japanese, as well as parser settings for a number of other languages. Read this Labs blog post to learn more about the 0.5 release and how to test it.
It’s not too late to add localizations for other languages to 0.5, though. Localizations help make Ubiquity more “natural” for more users, offering a new level of ease and familiarity to the already powerful Ubiquity. We have a new tutorial to help you localize commands.
To help encourage command localization, we now have gettext-style po template files for all the bundled command feeds in the hg repository. You can find these files in the ubiquity/localization/templates directory of the repository, or on our online hg repository.
If you complete some localizations (even incomplete) for your language and would like to submit them into the repository, for the time being, you can post them on this trac ticket.1
As we get close to wrapping up Ubiquity 0.5 (currently planned to ship—fingers crossed—on Monday) one remaining issue is how to incorporate our cute new Cocoia-designed and community-produced icon, the Ubiquibot. The difficult decision is how to take this finely detailed icon and produce a 16 x 16 favicon.
I came up with three different options:
1. 2. 3.
Seeing them on my blog doesn’t quite compare to how they will be used, so here are some screenshots of them in context:[^1]
localization of standard feed commands for a few languages
Parser 2 language files for those same languages
Nongoals for 0.5
distribution/sharing of localizations
localization of nountypes
The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source1 this process.
This past Monday I presented at Tokyo 2.0, Japan’s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from goo Labs, Shinjyou Sunao of Knowledge Creation, developers of the Voice Delivery System API, and Chris Salzberg of Global Voices Online on community translation.
I just put together a video of my Ubiquity presentation, mixing the audio recorded live at the presentation together with a screencast of my slides for better visibility. The presentation is 10 minutes long and is bilingual, English and Japanese.
Here’s a quick screencast highlighting some of the changes to Parser 2 and the updated Parser 2 Playpen. This video should be particularly useful to people hoping to add their language to Parser 2. It’s also a good reference for Ubiquity core developers.
Yesterday I was invited to give a lecture for students the MEXTIT Specialist Program. ITSP is a partnership between Keio, Waseda, and Chuo Universities and NTT, IBM, and Mozilla to bring advanced IT training and opportunities to their Master’s students. It was a longish time slot so I decided to split it up into two different talks: one on open source and open processes (similar to one of my sessions at the recent BarCamp Tokyo) and one on the future of interfaces, internationalization and globalization, and Ubiquity. Here are the slides for posterity. (Note: the second set of slides is mostly in Japanese.)
This past weekend was Mozilla Party JP 10 here in Japan and one of the speakers was Bob Chao (趙柏強) of Creative Commons Taiwan and MozTW. We got to talking in Chinese and he got a video interview of me talking about Ubiquity and our upcoming Parser 2 and the challenges of localization. I’ve never talked about my Mozilla work in Chinese before so it was definitely a challenge and I stumbled a lot, but hopefully some of the ideas got through.
前天我參加了Mozilla Party 10,一個日本 Mozilla 社群的會議。我在 Mozilla Party 才認識台灣 Mozilla 社群的趙柏強,我們就開始講國語。因為我自己很想念用中文,所以我非常高興有這個機會跟他談話。以後他拍一端 video,我給台灣的 Mozilla fans 把 Ubiquity 介紹一下。我的中文真的亂七八糟,大家對不起喔。 ^^;
Jono’s recently been thinking about how to get users involved with aside from programming, and he decided to put the textual content of Ubiquity’s builtin commands and the new interactive tutorial on the wiki for all to edit.
Changes made to these wiki pages will be tracked and edits will be moved back into the Ubiquity codebase as early as 0.1.9.
Combined with the imminent internationalization of Ubiquity commands, allowing contributors to localize commands without digging into the JavaScript code, there will soon be lots of different ways for to get involved with the further development of Ubiquity!
Now that Parser 2 is in decent shape and a number of parsing problems in different languages have been tackled, the focus has now shifted to coming up with an approach for localizing Ubiquity commands and nountypes. At last week’s weekly Ubiquity meeting we had a great conversation on this subject, which then has continued on the Google group.
I’ve been framing this problem as two subproblems:
What will be the data structure of localized commands/nountypes within Ubiquity?
How do we distribute/share these localizations?
We’ve mostly been discussing the first problem, weighing the merits of unified objects (with different localized text as different JS properties) as opposed to a gettext-style approach, and noting that our requirements for commands and nountypes may be different. I hope we can discuss the second issue more in the coming week.
Should everything go through the command author? Should localization be centralized through some web tool? Should it be completely distributed like commands currently are? I invite you to join us in this conversation on the Google group. ^^
Jono and I had a good conversation this morning on IRC about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for Ubiquity. Here are our Top 4 Big Issues:
Some commands’ preview’s and execute’s are not working properly (trac #652). This could be an underlying issue with some pipes not rerouted correctly in Parser 2, or it could be that the commands have not been rewritten correctly to take advantage of Parser 2.
Flesh out how to localize resources, like commands and nountypes. We started a conversation on this subject a few weeks ago but we never reached a resolution. This blocks issues 3 and 4 below.
We need to standardize a format for commands for Parser 2. As noted in last week’s meeting (among other places) Parser 2 will require at least some modification to all commands. Jono and I came up with a simple hybrid format for commands which specify takes and modifiers for Parser 1 and arguments for Parser 2, but until we figure out how exactly the localization of commands will work, we can’t write a definitive standard.
Enable nountype localization. While the most popular nountypes used are those that ship with Ubiquity, it is important to come up with a localization process which can apply to custom nountypes as well. Nountype localizations need the ability to either (1) replace the _name only, or (2) replace both the _name and the suggest() logic, as both cases will be necessary.
Given that Big Issue 3 and Big Issue 4 are both dependent on Big Issue 2, there clearly needs to be a continued public discussion of how we should make these resources localizable. I look forward to this discussion taking place at tomorrow’s joint (general + i18n) Ubiquity meeting.
In other news, here are some Small Issues:
Add a switch for parser version and language settings: Jono’s already made a space for this in the new “Settings and Skins” page in about:ubiquity. He’s on it. Like a bonnet.
Magic word (anaphor) substitution is not yet working properly. This needs to work both when there is an explicit magic word and when there are simply missing arguments.
The position of suggested verbs is always sentence-initial (trac #655). This also requires that we can specify whether verb name localizations are sentence-initial forms or sentence-final forms.1