It’s time for one more installment of Nountype Quirks, where I review and tweak Ubiquity’s built-in nountypes. For an introduction to this effort, please read Judging Noun Types and my updates from Day 1 and Day 2.
Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements to the geolocation-related nountypes whose plans I lay out here.
Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.(more…)
Today I’m continuing the process of reviewing and tweaking all of the nountypes built-in to Ubiquity. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, Judging Noun Types and my status update from yesterday, Nountype Quirks: Day 1.
Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.
Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I’ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I’ll document some of the nountype quirks I’ve discovered so far and what remedy has been implemented or is planned.
Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.
Different arguments are classified into different kinds of nouns in Ubiquity using noun types.1 For example, a string like “Spanish” could be construed as a language, while “14.3” should not be. These kinds of relations are then used by the parser to introduce, for example, language-related verbs (like translate) using the former argument, and number-related verbs (like zoom or calculate) based on the latter. Ubiquity nountypes aren’t exclusive—a single string can count as valid for a number of different nountypes and in particular the “arbitrary text” nountype (noun_arb_text) will always accept any string given.
This video walks through the process of converting your Ubiquity commands to Ubiquity 0.5 with Parser 2. For more information, please consult the command conversion tutorial.
By now many of you have probably seen this new Microsoft Australia campaign, “Ten Grand Is Buried Here.com,”1 which calls Firefox “old” and Safari “boring”:
I’m not sure what this is saying about me, but my immediate reaction was to go check whether tengrandisburiedthere.com was available. To my surprise, Microsoft had yet to snatch it up! A few hours later, here’s the result:
Note: Not being a marketing guy, I just threw some text together to introduce Firefox. If someone has some better copy for this display, please let me know.
As of this writing, this domain actually has yet to serve anything. ↩
I recently noticed that some of my blog posts, most notably my Templates in YARPP 3 article, was producing a PHP error:
Warning: preg_match() [function.preg-match]: Compilation failed: unrecognized character after (?< at offset 3 in /…/html/blog/wp-content/plugins/wp-syntax/geshi/geshi.php on line 2132
This seemed to be coming from the version 1.0.8.4 version of Geshi I had installed. A quick google search for “geshi line 2132” gives you over a thousand errors, so this seems to be common issue. Geshi is a fabulous and popular syntax highlighter and is the core component of the WP-Syntax plugin for WordPress.
I did some digging around and realized that the issue was with the compilation of this monstrosity of a regular expression, used (as far as I can tell) to identify PHP code snippets, for example the <?php … ?> keywords:
Not knowing exactly where to start in diagnosing this crazy expression, I simply disabled those “script delimiters” in the geshi/php.php file. The sections I commented out are lines 1080-1101. Now the script delimiters like <?php don’t get highlighted nicely, but I feel that’s a small price to pay for eliminating these errors. Another solution for the WP-Syntax users seems to be to downgrade to 0.9.4. Hopefully in the near future an update to Geshi will come out which fixes this issue once and for all.
localization of standard feed commands for a few languages
Parser 2 language files for those same languages
Nongoals for 0.5
distribution/sharing of localizations
localization of nountypes
The overall goal for this release of Ubiquity is to come up with a format and standard for localization. Localizations in Ubiquity 0.5 will only apply to commands bundled with Ubiquity, and the localization files themselves will be distributed with Ubiquity. In a future release we will tackle the problem of localizations for commands in the wild and truly croud-source1 this process.
This past Monday I presented at Tokyo 2.0, Japan’s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from goo Labs, Shinjyou Sunao of Knowledge Creation, developers of the Voice Delivery System API, and Chris Salzberg of Global Voices Online on community translation.
I just put together a video of my Ubiquity presentation, mixing the audio recorded live at the presentation together with a screencast of my slides for better visibility. The presentation is 10 minutes long and is bilingual, English and Japanese.
Here’s a quick screencast highlighting some of the changes to Parser 2 and the updated Parser 2 Playpen. This video should be particularly useful to people hoping to add their language to Parser 2. It’s also a good reference for Ubiquity core developers.
On Saturday I went to Mozilla Party 10, a community event organized by Mozilla-gumi (もじら組). Mozilla-gumi has been an active community in Japan for the past 10 years, making it one of the oldest Mozilla communities around. Despite the cloudy weather in Shinjuku and the ever-present swine flu scare, we had over 100 people attending.1
Now that Parser 2 is in decent shape and a number of parsing problems in different languages have been tackled, the focus has now shifted to coming up with an approach for localizing Ubiquity commands and nountypes. At last week’s weekly Ubiquity meeting we had a great conversation on this subject, which then has continued on the Google group.
I’ve been framing this problem as two subproblems:
What will be the data structure of localized commands/nountypes within Ubiquity?
How do we distribute/share these localizations?
We’ve mostly been discussing the first problem, weighing the merits of unified objects (with different localized text as different JS properties) as opposed to a gettext-style approach, and noting that our requirements for commands and nountypes may be different. I hope we can discuss the second issue more in the coming week.
Should everything go through the command author? Should localization be centralized through some web tool? Should it be completely distributed like commands currently are? I invite you to join us in this conversation on the Google group. ^^
Jono and I had a good conversation this morning on IRC about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for Ubiquity. Here are our Top 4 Big Issues:
Some commands’ preview’s and execute’s are not working properly (trac #652). This could be an underlying issue with some pipes not rerouted correctly in Parser 2, or it could be that the commands have not been rewritten correctly to take advantage of Parser 2.
Flesh out how to localize resources, like commands and nountypes. We started a conversation on this subject a few weeks ago but we never reached a resolution. This blocks issues 3 and 4 below.
We need to standardize a format for commands for Parser 2. As noted in last week’s meeting (among other places) Parser 2 will require at least some modification to all commands. Jono and I came up with a simple hybrid format for commands which specify takes and modifiers for Parser 1 and arguments for Parser 2, but until we figure out how exactly the localization of commands will work, we can’t write a definitive standard.
Enable nountype localization. While the most popular nountypes used are those that ship with Ubiquity, it is important to come up with a localization process which can apply to custom nountypes as well. Nountype localizations need the ability to either (1) replace the _name only, or (2) replace both the _name and the suggest() logic, as both cases will be necessary.
Given that Big Issue 3 and Big Issue 4 are both dependent on Big Issue 2, there clearly needs to be a continued public discussion of how we should make these resources localizable. I look forward to this discussion taking place at tomorrow’s joint (general + i18n) Ubiquity meeting.
In other news, here are some Small Issues:
Add a switch for parser version and language settings: Jono’s already made a space for this in the new “Settings and Skins” page in about:ubiquity. He’s on it. Like a bonnet.
Magic word (anaphor) substitution is not yet working properly. This needs to work both when there is an explicit magic word and when there are simply missing arguments.
The position of suggested verbs is always sentence-initial (trac #655). This also requires that we can specify whether verb name localizations are sentence-initial forms or sentence-final forms.1
Thanks to the great work of Sandro Della Giustina, we now have a preliminary Italian parser for use with Ubiquity Parser 2. Sandro brought up a good point, however, about Italian prepositions which contract with the article and the head noun. For example,
traduci dall'inglese al cinese
translate from=the=English to=the Chinese
One current solution is to add zero-width spaces after these contracted articles, all’ and dall’.1 The appropriate way to add this to the parser is by defining a custom wordBreaker() method.
As John Daggett pointed out to me, in the future we may have to add an intermediate shallow parse instead of adding characters (in this case, the zero-width space) to the modified input. ↩
Here’s a little picture of the different sections of text in a single parsed argument and which properties of the resultant argument object they are assigned to.
You’ll see, from left to right, outerSpace, modifier, innerSpace, inactivePrefix, input/data, inactiveSuffix.
The example text is from the Catalan example, “compra mitjons amb el Google,” meaning “buy socks with Google.” You’ll notice the argument “amb el Google” is literally “with the Google.” The normalizeArgument() method of the Catalan parser, as I described earlier this week, strips the article “el ” and puts it in the inactivePrefix property of the argument.
I’m going to spend the rest of the day updating Parser 2 design doc and related documentation so they match these and other recent developments in the parser.