mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.

blog

Posts Tagged ‘arguments’

The Aliens Aliases Have Landed

Friday, September 4th, 2009

close-encounters.jpg

This week I implemented a new way to customize and extend Ubiquity commands: CmdUtils.CreateAlias.

The use case for and importance of CreateAlias

CreateAlias lets you easily create a special-case alias of another, more generic verb. Ubiquity comes bundled with useful verbs like translate and search which can be used for a number of different uses based on their arguments. In some cases, and in some languages, though, typing out translate to English or search with Google is unnatural, though, as there is a more succinct and direct way to make that request. For example, in English one could say “anglicize” or “google”, respectively, for the verbs and arguments above. Indeed, in order to support both search with Google and google, Ubiquity traditionally has implemented two different verbs, search and google, which duplicate functionality and code.

CreateAlias lets us create such natural aliases [[Don’t_repeat_yourself|without repeating ourselves]]. We can easily create an anglicize verb which, in one word, does the work of translate to English, or google which is semantically equivalent to search with Google.

These sorts of aliases become particularly important in our perpetual quest to internationalize Ubiquity. One discussion that came up early on on our Ubiquity-i18n list is the fact that not all languages have the verb “Google”: in many languages it is necessary to explicitly say “search with Google”. Moreover, other languages may have other domain-specific verbs which English doesn’t have either. Maybe some language has a special verb for “email with Hotmail” or “map Denmark”. Who knows? With CreateAlias we can easily enable such localizations based on the more generic commands bundled with Ubiquity.

Creating an alias

CreateAlias was designed to be incredibly simple to use. Here’s an example that will be bundled (but not installed by default) in Ubiquity:

CmdUtils.CreateAlias({
  names: ["anglicize"],
  verb: "translate",
  givenArgs: { goal: "English" }
});

As you see, this syntax is incredibly straightforward. There are two required properties, names, an array of names for the alias, and verb, a reference to the target verb that this alias should use.1

The alias can also have a givenArgs property which is a hash of pre-specified arguments with their semantic roles. Because translate takes three arguments (an object text, a goal language, and a source language) but we have pre-specified a goal in the givenArgs, the new anglicize command will only take two arguments: the object text and a source language. Of course, if you specify no givenArgs, you’ll get a simple synonym without having access to the original verb’s code.

anglicize.png

As you see, the preview of this command is simply the preview of the translate verb. Its preview and execution is just as if you had entered translate こんにちは to English.

Just like other commands created with CreateCommand, the object specifying the alias can also have properties like help, description, author information, and so on. I used the icon property to add a [[Union Jack]] to it so that it was easily identifiable.

Bonus: using CmdUtils.previewCommand and CmdUtils.executeCommand

On the road to implementing CreateAlias, I also implemented the CmdUtils.previewCommand and CmdUtils.executeCommand functions. The majority of this code comes from previous work by Louis-Rémi Babé, though I adapted it to the modern Ubiquity system. Using previewCommand and executeCommand you can take advantage of the preview or execute functionality of another command. In the new alias-commands feed I included a command called germanize which essentially is a straightforward analogy to anglicize, seen above, but using these functions within a CreateCommand. While CreateAlias is much more straightforward for simple aliases, for more complex subcommands where you would like to adapt another verb’s execution or preview, or only take one of those but re-implement the other part, previewCommand and executeCommand are the way to go.


  1. The verb reference can be the canonical or reference name of a command, which is the first name in the names of a command (also the name listed in the command list when Ubiquity is running in English) or the actual internal ID of the command, which looks like resource://ubiquity/standard-feeds/general.html#translate

Nountype Quirks: Day 3: Geo Day

Saturday, August 1st, 2009

It’s time for one more installment of Nountype Quirks, where I review and tweak Ubiquity’s built-in nountypes. For an introduction to this effort, please read Judging Noun Types and my updates from Day 1 and Day 2.

Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements to the geolocation-related nountypes whose plans I lay out here.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site. (more…)

Nountype Quirks: Day 2

Thursday, July 30th, 2009

Today I’m continuing the process of reviewing and tweaking all of the nountypes built-in to Ubiquity. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, Judging Noun Types and my status update from yesterday, Nountype Quirks: Day 1.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.

(more…)

Nountype Quirks: Day 1

Wednesday, July 29th, 2009

Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I’ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I’ll document some of the nountype quirks I’ve discovered so far and what remedy has been implemented or is planned.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.

(more…)

Judging Noun Types

Wednesday, July 29th, 2009

Introduction

Different arguments are classified into different kinds of nouns in Ubiquity using noun types.1 For example, a string like “Spanish” could be construed as a language, while “14.3” should not be. These kinds of relations are then used by the parser to introduce, for example, language-related verbs (like translate) using the former argument, and number-related verbs (like zoom or calculate) based on the latter. Ubiquity nountypes aren’t exclusive—a single string can count as valid for a number of different nountypes and in particular the “arbitrary text” nountype (noun_arb_text) will always accept any string given.

In addition to the various built-in nountypes, Ubiquity lets command authors write their own nountypes as well.

The functions of a noun type

Nountypes have two functions: the first is accepting and suggesting suggestions and the second is scoring.

(more…)


  1. Or, as I often write them, “nountypes.” 

Solving a Romantic Problem: Portmanteau’ed Prepositions

Monday, May 11th, 2009

The problem:

In many [[romance languages]], prepositions and articles often form [[portmanteau|portmanteau morphs]], combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau’ed prepositions and articles… I refer you to the [[Contraction (grammar)#Italian|contraction]] article on Wikipedia.

As I noted a couple weeks ago, however, some combinations do not form portmanteaus.2

(more…)


  1. Thanks to Jeremy O’Brien for helping me figure out how to refer to this phenomenon. 

  2. This also relates to the issue of parsing multi-word delimiters, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters. 

Adding Your Language to Ubiquity Parser 2

Wednesday, April 29th, 2009

NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions.

You’ve seen the video. You speak another language. And you’re wondering, “how hard is it to add my language to Ubiquity with Parser 2?” The answer: not that hard. With a little bit of JavaScript and knowledge of and interest in your own language, you’ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please submit your (even incomplete) language files!

As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the Ubiquity Planet and/or this blog (RSS).

(more…)

A Demonstration of Ubiquity Parser 2

Friday, April 24th, 2009

Here’s a quick demonstration of Ubiquity Parser 2, aka “the new parser.” I’ll show you how you can use the parser yourself and point out some highlights of the new functionality.


Ubiquity Parser 2: better noun-first suggestions and command localization from mitcho on Vimeo.

(more…)

Attachment Ambiguity—or—when is the gyudon cheap?

Wednesday, April 15th, 2009

yoshinoya.jpg

Every day on the way to work I walk by a fine establishment known as [[Yoshinoya]] (吉野家), Japan’s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by [[gyudon]], it’s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.

Part of this sign, though, made me think about our new Ubiquity parser. In particular, it was the attachment ambiguity in the end date of the promotion. The text in the photo above literally is “April 15th (Wed.) 8PM until”. (Note that Japanese is a strongly head-final language, and that the “until” is a postposition.) There are two possible readings for this expression, as illustrated by the two [[principle of compositionality|composition]] trees below.

(more…)

Rolling out the Roles

Thursday, April 9th, 2009

Jono and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands with semantic roles. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use and individual languages’ parsers will be built to identify. Today I have just such a proposal.

(more…)

Ubiquity Commands by The Numbers

Wednesday, April 1st, 2009

Recent work in the Ubiquity internationalization realm has focused on the upcoming Ubiquity parser which will bring some great new features to Ubiquity, including support for overlord verbs and semi-automatic localization of commands via semantic roles. It’s possible, though, that these new features will break backwards compatibility of the current command specification and noun types. [[Creative destruction]] for the win.

As we look to move forward with incorporating the next generation parser into Ubiquity proper, it thus becomes important to take a look at the current command ecosystem to see how possibly disruptive this move will be. To this end last night I wrote a quick perl script to scrape the commands cached on the herd and get some quantitative answers to my questions.

(more…)

Ubiquity Parser: The Next Generation Demo

Wednesday, March 18th, 2009

parserdesign

A week or two ago while visiting California, Jono and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in Ubiquity Parser: The Next Generation. The new architecture is designed to support (1) the use of overlord verbs, (2) writing verbs by semantic roles, and (3) better suggestions for verb-final languages and other argument-first contexts. I’m happy to say that I’ve spent some time putting a proof-of-concept together.

I’ve implemented the basic algorithm of this parser for [[left-branching]] languages (like English) and also implemented some fake English verbs, noun types, and semantic roles. This demo should give you a basic sense of how this parser will attempt to identify different types of arguments and check their noun types even without clearly knowing the verb. This should make the suggestion ranking much smarter, particularly for verb-final contexts. (For a good example, try from Tokyo to San Francisco.)

➔ Check out the Ubiquity next-gen parser demo

(more…)

User-Aided Disambiguation: a demo

Saturday, March 14th, 2009

A few weeks ago I made some visual mockups of how Ubiquity could look and act in Japanese. Part of this proposal was what I called “particle identification”: that is, immediate in-line identification of delimiters of arguments, which can be overridden by the user:

The inspiration for this idea came from Aza’s blog post “Solving the ‘it’ problem” which advocates for this type of quick feedback to the user in cases of ambiguity. Such a method would help both the user better understand what is being interpreted by the system, as well as offer an opportunity for the user to correct improper parses. I just tried mocking up such an input box using jQuery.

Try the User-Aided Disambiguation Demo

If you have any bugfixes to submit or want to play around with your own copy, the demo code is up on BitBucket. ^^ Let me know what you think!

Writing commands with semantic roles

Tuesday, February 24th, 2009

Thank you to everyone who contributed data to how your language identifies its arguments! The data collection is ongoing so please contribute data points for languages you know!

How Ubiquity identifies its arguments

Currently when writing a command in Ubiquity you must specify two properties for each argument: a modifier (the appropriate [[adposition]]—the direct object excluded) and the noun type. Here are some quick examples from the standard commands:

email:

  • direct object (noun_arb_text)
  • to (noun_type_contact)

translate:

  • direct object (noun_arb_text)
  • to (noun_type_language)
  • from (noun_type_language)

This way of specifying arguments has a few shortcomings. First of all, it requires you to identify each type of argument by unique adposition, which does not support languages with [[case marking]] nor languages with sets of synonymous adpositions (e.g. French {à la, au, aux}). Second, as we saw in how your language identifies its arguments some languages don’t mark semantic roles on the arguments at all and the current system of specifying arguments is completely incompatible with these languages. Third, the current specification requires command authors to make localized versions of their commands, specifying the language-appropriate modifiers.

(more…)

Ubiquity in Firefox: Focus on Japanese

Friday, February 20th, 2009

One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I’m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let’s kick this off with my mother tongue, Japanese.1

今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^ 日本語でのコメントも大歓迎です!

(more…)