<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; arguments</title>
	<atom:link href="http://mitcho.com/blog/tag/arguments/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Thu, 29 Jul 2010 19:14:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>The Aliens Aliases Have Landed</title>
		<link>http://mitcho.com/blog/projects/the-aliens-aliases-have-landed/</link>
		<comments>http://mitcho.com/blog/projects/the-aliens-aliases-have-landed/#comments</comments>
		<pubDate>Sat, 05 Sep 2009 00:46:56 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[alias]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[command]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2841</guid>
		<description><![CDATA[This week I implemented a new way to customize and extend Ubiquity commands: CmdUtils.CreateAlias. The use case for and importance of CreateAlias CreateAlias lets you easily create a special-case alias of another, more generic verb. Ubiquity comes bundled with useful verbs like translate and search which can be used for a number of different uses [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/' rel='bookmark' title='Permanent Link: Ubiquity Localization: What&#8217;s New, What&#8217;s Next'>Ubiquity Localization: What&#8217;s New, What&#8217;s Next</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Permanent Link: Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/09/close-encounters.jpg" alt="close-encounters.jpg" border="0" width="640" height="300" /></p>

<p>This week <a href="http://ubiquity.mozilla.com/trac/ticket/201">I implemented</a> a new way to customize and extend Ubiquity commands: <code>CmdUtils.CreateAlias</code>.</p>

<h3>The use case for and importance of <code>CreateAlias</code></h3>

<p><code>CreateAlias</code> lets you easily create a special-case alias of another, more generic verb. Ubiquity comes bundled with useful verbs like <code>translate</code> and <code>search</code> which can be used for a number of different uses based on their arguments. In some cases, and in some languages, though, typing out <code>translate to English</code> or <code>search with Google</code> is <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">unnatural</a>, though, as there is a more succinct and direct way to make that request. For example, in English one could say &#8220;anglicize&#8221; or &#8220;google&#8221;, respectively, for the verbs and arguments above. Indeed, in order to support both <code>search with Google</code> and <code>google</code>, Ubiquity traditionally has implemented two different verbs, <code>search</code> and <code>google</code>, which duplicate functionality and code.</p>

<p><code>CreateAlias</code> lets us create such natural aliases <a href="http://en.wikipedia.org/wiki/Don&#8217;t_repeat_yourself">without repeating ourselves</a>. We can easily create an <code>anglicize</code> verb which, in one word, does the work of <code>translate to English</code>, or <code>google</code> which is semantically equivalent to <code>search with Google</code>.</p>

<p>These sorts of aliases become particularly important in our perpetual quest to internationalize Ubiquity. One discussion that came up early on on our <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity-i18n</a> list is the fact that not all languages have the verb &#8220;Google&#8221;: in many languages it is necessary to explicitly say &#8220;search with Google&#8221;. Moreover, other languages may have other domain-specific verbs which English doesn&#8217;t have either. Maybe some language has a special verb for &#8220;email with Hotmail&#8221; or &#8220;map Denmark&#8221;. Who knows? With <code>CreateAlias</code> we can easily enable such localizations based on the more generic commands bundled with Ubiquity.</p>

<h3>Creating an alias</h3>

<p><code>CreateAlias</code> was designed to be incredibly simple to use. Here&#8217;s an example that will be bundled (but not installed by default) in Ubiquity:</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">CmdUtils.<span style="color: #660066;">CreateAlias</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#123;</span>
  names<span style="color: #339933;">:</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;anglicize&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span>
  verb<span style="color: #339933;">:</span> <span style="color: #3366CC;">&quot;translate&quot;</span><span style="color: #339933;">,</span>
  givenArgs<span style="color: #339933;">:</span> <span style="color: #009900;">&#123;</span> goal<span style="color: #339933;">:</span> <span style="color: #3366CC;">&quot;English&quot;</span> <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>


<p>As you see, this syntax is incredibly straightforward. There are two required properties, <code>names</code>, an array of names for the alias, and <code>verb</code>, a reference to the target verb that this alias should use.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>The alias can also have a <code>givenArgs</code> property which is a hash of pre-specified arguments with their <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles">semantic roles</a>. Because <code>translate</code> takes three arguments (an <code>object</code> text, a <code>goal</code> language, and a <code>source</code> language) but we have pre-specified a <code>goal</code> in the <code>givenArgs</code>, the new <code>anglicize</code> command will only take two arguments: the <code>object</code> text and a <code>source</code> language. Of course, if you specify no <code>givenArgs</code>, you&#8217;ll get a simple synonym without having access to the original verb&#8217;s code.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/09/anglicize1.png" alt="anglicize.png" border="0" width="650" height="152" /></p>

<p>As you see, the preview of this command is simply the preview of the <code>translate</code> verb. Its preview and execution is just as if you had entered <code>translate こんにちは to English</code>.</p>

<p>Just like other commands created with <code>CreateCommand</code>, the object specifying the alias can also have properties like <code>help</code>, <code>description</code>, <code>author</code> information, and so on. I used the <code>icon</code> property to add a <a href="http://en.wikipedia.org/wiki/Union Jack">Union Jack</a> to it so that it was easily identifiable.</p>

<h3>Bonus: using <code>CmdUtils.previewCommand</code> and <code>CmdUtils.executeCommand</code></h3>

<p>On the road to implementing <code>CreateAlias</code>, I also implemented the <code>CmdUtils.previewCommand</code> and <code>CmdUtils.executeCommand</code> functions. The majority of this code comes from previous work by <a href="http://groups.google.com/group/ubiquity-firefox/browse_thread/thread/993411167fc6f165">Louis-Rémi Babé</a>, though I adapted it to the modern Ubiquity system. Using <code>previewCommand</code> and <code>executeCommand</code> you can take advantage of the preview or execute functionality of another command. In the new <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/standard-feeds/alias-commands.js">alias-commands</a> feed I included a command called <code>germanize</code> which essentially is a straightforward analogy to <code>anglicize</code>, seen above, but using these functions within a <code>CreateCommand</code>. While <code>CreateAlias</code> is much more straightforward for simple aliases, for more complex subcommands where you would like to adapt another verb&#8217;s execution or preview, or only take one of those but re-implement the other part, <code>previewCommand</code> and <code>executeCommand</code> are the way to go.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>The <code>verb</code> reference can be the canonical or <em>reference name</em> of a command, which is the first name in the <code>names</code> of a command (also the name listed in the command list when Ubiquity is running in English) or the actual internal ID of the command, which looks like <code>resource://ubiquity/standard-feeds/general.html#translate</code>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/' rel='bookmark' title='Permanent Link: Ubiquity Localization: What&#8217;s New, What&#8217;s Next'>Ubiquity Localization: What&#8217;s New, What&#8217;s Next</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Permanent Link: Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/the-aliens-aliases-have-landed/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 3: Geo Day</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-3/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-3/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 04:20:22 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2647</guid>
		<description><![CDATA[.scorebar { background-color:red; display:inline-block; height:0.5em; vertical-align:middle; } .scoretable td { font-size: 0.7em; } It&#8217;s time for one more installment of Nountype Quirks, where I review and tweak Ubiquity&#8217;s built-in nountypes. For an introduction to this effort, please read Judging Noun Types and my updates from Day 1 and Day 2. Today I ended up spending [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Permanent Link: Judging Noun Types'>Judging Noun Types</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>It&#8217;s time for one more installment of Nountype Quirks, where I review and tweak <a href="http://ubiquity.mozilla.com">Ubiquity</a>&#8217;s built-in nountypes. For an introduction to this effort, please read <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a> and my updates from <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">Day 1</a> and <a href="http://mitcho.com/blog/projects/nountype-quirks-day-2/">Day 2</a>.</p>

<p>Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements to the geolocation-related nountypes whose plans I lay out here.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-3/">on my site</a>.</em><span id="more-2647"></span></p>

<h3><code>noun_type_geolocation</code></h3>

<p><code>noun_type_geolocation</code> is the nountype used by the <code>weather</code> command for its location argument in input like &#8220;weather near Chicago&#8221;. The neat feature of <code>noun_type_geolocation</code> is that it has a smart default value which uses Firefox&#8217;s geolocation system to give you your current location by default, so I can enter &#8220;weather&#8221; and get the suggestion &#8220;weather near Broomfield, Colorado&#8221; (not completely correct, but close enough for the weather). Otherwise, however, <code>noun_type_geolocation</code> does not do too hot&#8230; for any input you give it, it&#8217;ll just accept it with a score of 0.3, much like <code>noun_arb_text</code>. We could do better.</p>

<p>One issue with this <code>noun_type_geolocation</code> is a conceptual one. Is this nountype supposed to accept only municipalities? Countries? Or should it accept landmarks or addresses as well? Part of the issue is that it&#8217;s only used by one built-in command in Ubiquity now, <code>weather</code>. But to be called a general &#8220;geolocation&#8221; nountype, its output should not be specific to <code>weather</code>&#8217;s usage, which is to throw the result at the <a href="http://wunderground.com">Weather Underground</a> API.</p>

<p>I propose that we change this to be something like <code>noun_type_geo_town</code> and also make similar nountypes like <code>noun_type_geo_country</code>, <code>noun_type_geo_region</code>, going all the way down to <code>noun_type_address</code> (which already exists—see below). All of the nountypes in this family could use a geocoding API such as <a href="http://code.google.com/apis/maps/documentation/geocoding/index.html">Google&#8217;s</a> or <a href="http://developer.yahoo.com/maps/rest/V1/geocode.html">Yahoo&#8217;s</a>. Their <code>data</code> properties could include all of this geocoded geographic data (in English) and also the latitude/longitude coordinate data.</p>

<p>The <code>weather</code> command could then accept <code>noun_type_geo_town</code> but, as some municipalities are not in Weather Underground or, for some countries, it is only as granular as administrative districts, we could just display the results of the geocoding API but then give Weather Underground the geocoded latitude/longitude data.</p>

<h3><code>noun_type_async_address</code></h3>

<p><code>noun_type_async_address</code> attempts to do exactly what I&#8217;ve laid out above for the most granular level: that of geolocations with data all the way down to the street level. This is the nountype which is used for the built-in <code>map</code> command and uses the <a href="http://developer.yahoo.com/maps/rest/V1/geocode.html">Yahoo geocoding service</a> to accomplish this. Let&#8217;s see what kinds of results it returns:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>mitcho</td><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='1'>grenada</td><td>grenada</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>jono</td><td>jono</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>mountain view</td><td>mountain view</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
</table>

<p>Let&#8217;s lay out some immediate quirks:</p>

<ol>
<li>All scores are either 0.5 or 0.9. In general, if the Yahoo API returns some geocoded interpretation, it gets 0.9, but otherwise it accepts everything with 0.5.</li>
<li>The results that came back from the Yahoo service doesn&#8217;t add any useful information like the country or administrative region. Even the case stays lowercase.</li>
<li>Since when is Jono a location!? I&#8217;ll get back to this later.</li>
</ol>

<p>For starters, the Yahoo! Maps API terms of service dictate that we can&#8217;t use its geocoding service if we&#8217;re not also displaying Yahoo maps, so I rewrote it using the Google API which also had the advantage of offering JSON output.</p>

<p>One quirk of the Google Geocoding API, though, is that all of the resulting municipality names are only in English. Try for example queries for <a href="http://maps.google.com/maps/geo?q=Wien&amp;output=json&amp;oe=utf8&amp;sensor=false">Wien</a> or <a href="http://maps.google.com/maps/geo?q=%E6%9D%B1%E4%BA%AC&amp;output=json&amp;oe=utf8&amp;sensor=false">東京 (Tokyo)</a>. Since we want our suggestions to only add information to our input, not replace the input entirely (and especially not in another language), we&#8217;ll then only take results which have the input as an initial substring. On the other hand, if none of the results have the input as a proper prefix of the return value, we will take the geocoding information from the first result but with the original input as the display text. Such results will have a markedly lower score.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>As this is the <code>address</code> nountype, we&#8217;ll penalize results which do not have detailed information such as street address or town-level information. All of this is very easy to judge as every result from the API has a <a href="http://code.google.com/intl/ja/apis/maps/documentation/geocoding/index.html#GeocodingAccuracy">geocoding accuracy</a> value.</p>

<h3>The best laid plans of mice and men&#8230;</h3>

<p>I spent a good few hours this afternoon and evening <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/377daf3fe57a">attempting to implement</a> this new family of nountypes, including this new <code>nountype_geo_address</code>, but also <code>nountype_geo_subregion</code>, <code>nountype_geo_region</code>, and <code>nountype_geo_country</code>. Some of the quirks of the <code>weather</code> and <code>map</code> commands, however, have prevented me from completely replacing the legacy <code>noun_type_address</code> and <code>noun_type_geolocation</code> described above. I hope to continue this work again soon and actually make this transition, ideally before 0.5.2.</p>

<p>Look forward to one (or maybe two?) more episode(s) of Nountype Quirks where I hope to definitively explain, analyze, and tweak <code>matchScore</code>, the scoring algorithm which underlies the majority of the nountypes in Ubiquity. As always, I look forward to your comments and feedback.</p>

<h3>Bonus: Where&#8217;s Jono?</h3>

<p>It turns out that <code>noun_type_async_address</code> was recognizing &#8220;Jono&#8221; as an address because Jono is actually a location afterall! Not only that, but Jono is in Japan!!</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/Picture-31.png" alt="Picture 3.png" border="0" width="594" height="525" /></p>

<p>You clearly <a href="http://jonoscript.files.wordpress.com/2009/06/ubiquity_japanese.png">can&#8217;t take Japan out of Jono</a>, but it turns out you can&#8217;t take Jono out of Japan either.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>If this crazy algorithm raises a red flag for anyone, you&#8217;re not alone&#8230; if you think of a more elegant solution, please let me know. This will no doubt be an issue when it comes to localizing the <code>address</code> nountype as well. I wish we could specify an output language for the Google Geocoding API&#8230; <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> &#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Permanent Link: Judging Noun Types'>Judging Noun Types</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 2</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-2/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-2/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:44:52 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2635</guid>
		<description><![CDATA[.scorebar { background-color:red; display:inline-block; height:0.5em; vertical-align:middle; } .scoretable td { font-size: 0.7em; } Today I&#8217;m continuing the process of reviewing and tweaking all of the nountypes built-in to Ubiquity. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, Judging Noun Types and my status update from [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Permanent Link: Judging Noun Types'>Judging Noun Types</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>Today I&#8217;m continuing the process of reviewing and tweaking all of the nountypes built-in to <a href="http://ubiquity.mozilla.com">Ubiquity</a>. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a> and my status update from yesterday, <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">Nountype Quirks: Day 1</a>.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-2/">on my site</a>.</em></p>

<p><span id="more-2635"></span></p>

<h3><code>noun_type_twitter_user</code></h3>

<p>Let&#8217;s begin again by considering the suggestions and scores that a variety of different inputs to this nountype return and see what quirks we find.</p>

<p>To test this nountype, I made sure I had logged into <a href="http://twitter.com">Twitter</a> once with the login <a href="http://twitter.com/mitchoyoshitaka/"><code>mitchoyoshitaka</code></a>.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='2'>mitcho</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='2'>mitchoyoshi</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:470px'></span> 0.94</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>test</td><td>test</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>テスト</td><td><i>none</i></td><td></td></tr>
<tr><td>hello world</td><td><i>none</i></td><td></td></tr>
<tr><td>@test</td><td><i>none</i></td><td></td></tr>
</table>

<p><a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">As nountypes go</a>, this is looking pretty good. For usernames which look like logins we&#8217;ve saved before, we&#8217;re using <code>matchScore</code> to get decent differential scores.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> It&#8217;s even ruling out impossible twitter username strings, according to Twitter&#8217;s own restriction:</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/twitter-usernames.png" alt="twitter-usernames.png" border="0" width="574" height="75" /></p>

<p>One possible improvement we could make is to let @ strings be accepted. I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/97871e3a453c">went ahead and made this improvement</a>. The initial @ will be stripped off and then will be checked as normal, but the final score will receive a slight boost using an <a href="http://en.wikipedia.org/wiki/nth_root"><i>n</i>th root</a> formula. The <code>twitter</code> command was also updated to deal with inputs with and without the initial @.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='2'>mitcho</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='2'>@mitcho</td><td>@mitchoyoshitaka</td><td><span class='scorebar' style='width:440px'></span> 0.88</td></tr>
<tr><td>@mitcho</td><td><span class='scorebar' style='width:285px'></span> 0.57</td></tr>

<tr><td>test</td><td>test</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>@test</td><td>@test</td><td><span class='scorebar' style='width:285px'></span> 0.57</td></tr>
</table>

<p>Although the <code>noun_type_twitter_user</code> nountype is currently most used by the built-in <code>twitter</code> command to specify the user&#8217;s username, in theory it could also be used for example in a command which pulls up another user&#8217;s tweets. With that in mind, perhaps in the future we could check the browser history and/or bookmarks for entries of the form <code>http://twitter.com/...</code> and suggest those as well (<a href="http://ubiquity.mozilla.com/trac/ticket/846">trac #846</a>).</p>

<h3><code>noun_type_number</code></h3>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>text</td><td><i>none</i></td><td></td></tr>
<tr><td>0.5</td><td>0.5</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.5.1</td><td><i>none</i></td><td></td></tr>
</table>

<p>This nountype has an incredibly simple job and does it with ease. I&#8217;m going to leave it alone.</p>

<h3><code>noun_type_date</code> and <code>noun_type_time</code></h3>

<p><code>noun_type_date</code> and <code>noun_type_time</code> both use the magical <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Date/parse">Date.parse</a> method to parse date- and time-like strings. Let&#8217;s first take a look at some of its suggestions:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th><code>date</code> suggestion</th><th><code>time</code> suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="250" height="15" /></th></tr>
<tr><td rowspan='1'>June 8th 5pm</td><td>2009-06-08</td><td>05:00 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5pm</td><td>2009-07-30</td><td>05:00 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5</td><td>2009-07-05</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>June 8th</td><td>2009-06-08</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>today</td><td>2009-07-30</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>now</td><td>2009-07-30</td><td>02:40 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5pm is a good time</td><td><i>none</i></td><td><i>none</i></td><td></td></tr>
</table>

<p>The quirks in these outputs can be summed up into these two factors:</p>

<ol>
<li>There is no differential scoring at all.</li>
<li>Both nountypes parse the input with <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Date/parse">Date.parse</a> and then just spit out the date or time components of the result. Thus time-only inputs get the default date and date-only inputs get the default time with equal scores.</li>
</ol>

<p>I just rewrote both nountypes and also added a new <code>noun_type_date_time</code>. Here are some of the features of the new implementation:</p>

<ol>
<li>If the input only contains digits and spaces, it is marked down.</li>
<li>With the exception of the outputs &#8216;today&#8217; and &#8216;now&#8217;, if the resulting <code>Date</code> object&#8217;s date is today, its date suggestion is scored lower; equivalently for time being the default value, &#8220;12:00 AM&#8221;.</li>
<li>Scores (with the exception of &#8216;today&#8217; and &#8216;now&#8217;) which are shorter than the output string get a slight penalty. This factor reflects the intuition that a longer output than input means some generic information was added and thus there is less confidence in the output.</li>
</ol>

<p>Here&#8217;s what some of the inputs give now:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>

<tr><td rowspan='3'>June 8th 5pm</td><td><code>date</code>: 2009-06-08</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>time</code>: 05:00 PM</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;05:00 PM</td><td><span class='scorebar' style='width:430px'></span> 0.86</td></tr>

<tr><td rowspan='3'>5pm</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:135px'></span> 0.27</td></tr>
<tr><td><code>time</code>: 05:00 PM</td><td><span class='scorebar' style='width:405px'></span> 0.81</td></tr>
<tr><td><code>date_time</code>: 2009-07-30&#160;05:00 PM</td><td><span class='scorebar' style='width:245px'></span> 0.49</td></tr>

<tr><td rowspan='3'>5</td><td><code>date</code>: 2009-07-05</td><td><span class='scorebar' style='width:265px'></span> 0.53</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:95px'></span> 0.19</td></tr>
<tr><td><code>date_time</code>: 2009-07-05&#160;12:00 AM</td><td><span class='scorebar' style='width:170px'></span> 0.34</td></tr>

<tr><td rowspan='3'>June 8th</td><td><code>date</code>: 2009-06-08</td><td><span class='scorebar' style='width:475px'></span> 0.95</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:175px'></span> 0.35</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;12:00 AM</td><td><span class='scorebar' style='width:170px'></span> 0.58</td></tr>

<tr><td rowspan='3'>today</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:225px'></span> 0.45</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;12:00 AM</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>

<tr><td rowspan='3'>now</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;04:34 PM</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>

</table>

<p>In addition, looking to the future we&#8217;d <a href="http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/">like to make nountypes localizable</a> as well, and these two nountypes in particular will surely require some good thinking and planning to make localizable.</p>

<h3><code>noun_type_email</code> and <code>noun_type_contact</code></h3>

<p><code>noun_type_email</code> and <code>noun_type_contact</code> are two closely related nountypes. <code>noun_type_email</code> simply validates email address-looking strings, while <code>noun_type_contact</code> will return the <code>noun_type_email</code> suggestions and additionally return contacts from GMail if available.</p>

<p>The first thing to note is that I&#8217;ve often found the GMail contact lookup to be finicky in my own use. Reading through the code, I discovered the solution: GMail must either be open in a tab or you must use the &#8220;stay signed in&#8221; option and close the GMail tab.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> With this mystery solved, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/8478c7103753">some code cleanup done to this contact fetching</a>, let&#8217;s take a look at some example suggestions: (suggestions overlapping with <code>noun_type_email</code> are not listed here)</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>aza@m</td><td>aza@mozilla.com</td><td><span class='scorebar' style='width:210px'></span> 0.42</td></tr>
<tr><td rowspan='1'>jono</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:140px'></span> 0.28</td></tr>
<tr><td rowspan='1'>jdicarlo</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:95px'></span> 0.19</td></tr>
</table>

<p>In general, we see that these scores all look pretty poor. In particular, though, note that the &#8220;jono&#8221; input yielded a higher score for the same suggestion than &#8220;jdicarlo&#8221;, even though &#8220;jdicarlo&#8221; is longer and thus, intuitively, has more informational content and should maybe do better. Digging into the code I realized why this is. It was computing the scores by comparing &#8220;jono&#8221; and &#8220;jdicarlo&#8221; not simply to &#8220;Jono DiCarlo&#8221; and &#8220;jdicarlo@mozilla.com&#8221; respectively, but to the combined string &#8220;Jono DiCarlo &lt;jdicarlo@mozilla.com&gt;&#8221;. Now with <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/0877848192f2">this change</a> in place, both the email address and name are analyzed individually and, due to the way nountype detection works in Parser 2, no duplicates are returned. Here are the updated results:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>jono</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:415px'></span> 0.83</td></tr>
<tr><td rowspan='1'>jdicarlo</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
</table>

<p>That&#8217;s much better!</p>

<p>Now let&#8217;s consider the suggestions from <code>noun_type_email</code>. Here are what they originally looked like:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>bpung</td><td><i>none</i></td><td></td></tr>
<tr><td rowspan='1'>bpung@m</td><td>bpung@m</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td rowspan='1'>bpung@mozilla.com</td><td>bpung@mozilla.com</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<p><code>noun_type_email</code> is based on <a href="http://blog.livedoor.jp/dankogai/archives/51190099.html">a very robust regular expression</a> for <a href="http://www.ietf.org/rfc/rfc2822.txt">RFC 2822</a>. Unfortunately this means that it completely rules out strings such as &#8220;bpung&#8221; which could be a proper prefix of an email address—something that I&#8217;ve advocated for avoiding before (see footnote 2 of <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a>). Moreover, due to a quirk of how nountypes based on regular expressions are scored, all results are given the score of 1.</p>

<p>I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/0d1803104c7d">just committed a change</a> so that this behavior is improved. The new version accepts strings which match the username part of the email address spec sans @ and domain, but with a great score penalty.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> Moreover, domains which do not have a final label (the <a href="http://en.wikipedia.org/wiki/top level domain">top level domain</a>) with more than one letter (unless it&#8217;s an IP address) or do not have any periods (.) in the domain will be penalized as well. Here&#8217;s what the same inputs produce now:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>bpung</td><td>bpung</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='1'>bpung@m</td><td>bpung@m</td><td><span class='scorebar' style='width:400px'></span> 0.8</td></tr>
<tr><td rowspan='1'>bpung@mozilla.com</td><td>bpung@mozilla.com</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<h3>Same time, same channel</h3>

<p>I hope this post sheds light on the many changes I made together as well as the underlying thought process. If you don&#8217;t agree with any particular fix or analysis, please comment! I&#8217;ll be back again tomorrow with another installment of Nountype Quirks. Stay tuned!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Again, <code>matchScore</code> will be the subject of another blog post in the near future.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>Moreover, due to the way <code>noun_type_contact</code> caches the contact list internally, as long as GMail&#8217;s contacts are available once, you should be able to continue accessing those contacts&#8217; suggestions after logging out of GMail. There are also great performance benefits to this caching. The downside is that we currently have no way to know when to clear the cache, so even if you update your contacts in GMail, those new contacts won&#8217;t appear in Ubiquity until you restart Firefox.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Perhaps this is a horrible idea, because if executed or previewed, any verb which uses these nountypes would have to deal with arguments which are not valid email addresses. In my mind, though, as long as it doesn&#8217;t actually cause any error, this should be okay. Keep in mind that, given the very low scores given to these suggestions, parses using it would most likely only show up if the verb which requires these nountypes was explicitly given and there are other arguments as well, for example in input like &#8220;email hello to bpung&#8221;. In such a situation, we would rather this suggestion not disappear until we type &#8220;@m&#8221;. If executed, the built-in email verb, for instance, will deal with this gracefully by simply putting the incomplete email address in the To field.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Permanent Link: Judging Noun Types'>Judging Noun Types</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 1</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-1/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-1/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 23:00:56 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2598</guid>
		<description><![CDATA[.scorebar { background-color:red; display:inline-block; height:0.5em; vertical-align:middle; } .scoretable td { font-size: 0.7em; } Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I&#8217;ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Permanent Link: Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>Today I began the process of going through all of the nountypes built-in to <a href="http://ubiquity.mozilla.com">Ubiquity</a> using <a href="http://mitcho.com/blog/projects/judging-noun-types/">the principles and criteria I laid out yesterday</a>—a task I&#8217;ve had <a href="http://ubiquity.mozilla.com/trac/ticket/746">in planning</a> for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I&#8217;ll document some of the nountype quirks I&#8217;ve discovered so far and what remedy has been implemented or is planned.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">on my site</a>.</em></p>

<p><span id="more-2598"></span></p>

<h3><code>noun_type_percentage</code></h3>

<p>Here&#8217;s what a few different inputs originally returned:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>20</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.2</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.2%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20.0</td><td>2000%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>2 hens in the garden</td><td>2%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<p>Let me highlight a couple obvious quirks:</p>

<ol>
<li>In certain cases, where the numerical expression includes a decimal and is less than one, it is interpreted as a proportional, rather than percent, value, e.g. &#8220;0.2&#8221; → &#8220;20%&#8221;. &#8220;0.2%&#8221; is not even an option. This is the case even when explicitly adding a % sign.</li>
<li>All suggestions, including those where the numeral was extracted from a long string of text (e.g. &#8220;2 hens in the garden&#8221;), get the same score of 1.</li>
</ol>

<p>I just <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/c3cd4af0f06a">committed a fix</a> so <code>noun_type_percentage</code> now&#8230;</p>

<ol>
<li>Counts the number of characters in the input which match <code>[\d.%]</code> and caps the score by (number of acceptable characters)/(length of input).</li>
<li>Strings which do not include &#8220;%&#8221; get a 10% penalty.</li>
<li>In the case of decimals less than 1 without a % sign, the proportion interpretation is also suggested (e.g. &#8220;0.2&#8221; → &#8220;20%&#8221;) in addition to the original suggestion (&#8220;0.2%&#8221;), but with a slight penalty.</li>
</ol>

<p>Here is what they now return:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>20</td><td>20%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>20%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td rowspan='2'>0.2</td><td>0.2%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>20%</td><td><span class='scorebar' style='width:405px'></span> 0.81</td></tr>
<tr><td>0.2%</td><td>0.2%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20.0</td><td>20%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>2 hens in the garden</td><td>2%</td><td><span class='scorebar' style='width:25px'></span> 0.05</td></tr>
</table>

<h3><code>noun_type_tag</code></h3>

<p>Here&#8217;s what a few different inputs originally returned. Keep in mind that currently in this test profile, the preexisting tags are &#8220;animal&#8221;, &#8220;help&#8221;, &#8220;test&#8221;, and &#8220;ubiquity&#8221;.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>animal</td><td>animal</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>mineral</td><td>mineral</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>anim</td><td>animal</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>anim</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>help, test, ubiq</td><td>help,test,ubiquity</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>help,test,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>google, yahoo, ubiq</td><td>google,yahoo,ubiquity</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>google,yahoo,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>google, , yahoo</td><td>google,yahoo</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
</table>

<p>Here are a few of <code>noun_type_tag</code>&#8217;s quirks:</p>

<ol>
<li>There are only two scores ever given out: 0.3 and 0.7.</li>
<li>Only the last tag in the list and whether it exists or not is taken into account.</li>
<li>When the last tag is incomplete, the completion is suggested with a higher score, but if the last tag is <em>exactly</em> equal to an existing tag, it gets the lower score.</li>
</ol>

<p>Ideally, we want <code>noun_type_tag</code> to look at each of the tags given to it, with higher scores for when there are more preexisting tags and fewer new ones. Keep in mind, though, that we only have to suggest the completion of the very last tag as that may be one where the user hasn&#8217;t completed typing yet&#8230; for earlier tags, we can assume (safely or not) that the user placed the comma where they meant to. We can&#8217;t teach Ubiquity to read minds, after all.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>With this in mind, I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/54e6a232ec3a">just made a change</a> to <code>noun_type_tag</code> which aims to follow these principles. The basic idea is that we start with a base score of 0.3 but then raise it via <a href="http://en.wikipedia.org/wiki/nth root"><i>n</i>th root</a> for every tag in the sequence which is preexisting. Here&#8217;s what the same inputs return now. Recall that the preexisting tags are &#8220;animal&#8221;, &#8220;help&#8221;, &#8220;test&#8221;, and &#8220;ubiquity&#8221;.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>animal</td><td>animal</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>mineral</td><td>mineral</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>anim</td><td>animal</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>anim</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>help, test, ubiq</td><td>help,test,ubiquity</td><td><span class='scorebar' style='width:430px'></span> 0.86</td></tr>
<tr><td>help,test,ubiq</td><td><span class='scorebar' style='width:370px'></span> 0.74</td></tr>
<tr><td rowspan='2'>google, yahoo, ubiq</td><td>google,yahoo,ubiquity</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>google,yahoo,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>google, , yahoo</td><td>google,yahoo</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
</table>

<h3><code>noun_type_awesomebar</code></h3>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='4'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
</table>

<p>There are a couple quirks here:</p>

<ol>
<li>All suggestions are returned with the same scores.</li>
<li>The nountype returns the URL of the entry as the HTML-formatted result and the title as the text-formatted result, which clearly does not make sense. However, it&#8217;s not clear to me whether the title, URL, or some combination of both is what we should be returning as the suggestion text presented to the user.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></li>
</ol>

<p>I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/cb98c72364db">just rewrote <code>noun_type_awesomebar</code></a> to actually do some differential scoring. This new version also presents the URL or title depending on whichever had a better match using the <code>matchScore</code> function.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='4'>moz</td><td class="sugg">www.mozilla.com</td><td class="score"><span style="width: 350px;" class="scorebar">&nbsp;</span> 0.7</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 315px;" class="scorebar">&nbsp;</span> 0.63</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 305px;" class="scorebar">&nbsp;</span> 0.61</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>
</table>

<h3><code>noun_type_url</code></h3>

<p>The purpose of <code>noun_type_url</code>&#8217;s suggest function is two-fold: first, to accept strings which may look like a URL and, second, to suggest URL&#8217;s from the history just like <code>noun_type_url</code>, but only based on URL matches and not title matches.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> Here are a few sample inputs:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='5'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://moz</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>

<tr><td rowspan='1'>test</td><td class="sugg">http://test</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http://</td><td class="sugg">http://</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http:</td><td class="sugg">http:</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http</td><td class="sugg">http</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>_test</td><td class="sugg">http://_test</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>hello world!</td><td class="sugg">http://hello world!</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
</table>

<p>Oh, where to begin!? Here are some initial quirks&#8230; it&#8217;s possible that you could think of more!</p>

<ol>
<li>There is no differential scoring&#8230; only 0.9 for suggestions from history and 0.5 for URL-like strings.</li>
<li>A number of invalid domain names are being accepted and turned into suggestions (&#8220;hello world!&#8221;, &#8220;_test&#8221;, etc.).</li>
<li>It&#8217;s trying to be smart by suggesting &#8220;http://&#8221; as a default <a href="http://en.wikipedia.org/wiki/URI scheme">URI scheme</a> but doing so even for prefixes (initial substrings) of the word &#8220;http&#8221; itself.</li>
</ol>

<p>With these thoughts in mind, I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/26f179661107">just took a first stab</a> at improving this situation. Here are some features of the new implementation:</p>

<ol>
<li>History entries are scored in the same way as in <code>noun_type_awesomebar</code>, using <code>matchScore</code>.</li>
<li>URLs without an explicit <a href="http://en.wikipedia.org/wiki/URI scheme">URI scheme</a> (like &#8220;http://&#8221;) get a 10% penalty.</li>
<li>&#8220;http://&#8221; is only suggested if one of a long list of common URI schemes are not detected.</li>
<li>It repairs schemes which are missing a slash or two, suggesting for example &#8220;http:hello.com&#8221; → &#8220;http://hello.com&#8221;.</li>
<li>It actually uses Firefox&#8217;s own <a href="https://developer.mozilla.org/en/nsIIDNService">IDNService</a> to check if the domain name is a valid <a href="http://en.wikipedia.org/wiki/internationalized domain name">internationalized domain name</a>. If it&#8217;s an IDN as opposed to LDH (&#8220;letters, digits, and hyphens&#8221;), it gets a 10% penalty. If it&#8217;s not even a valid IDN, it is ruled out (see last two example inputs below).</li>
<li>There are also penalties for only being a domain name with no path and for the domain not having any periods (.) in it.</li>
</ol>

<p>Here is what our suggestions now look like:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='5'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>
<tr><td class="sugg">http://moz</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 315px;" class="scorebar">&nbsp;</span> 0.63</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 305px;" class="scorebar">&nbsp;</span> 0.61</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>

<tr><td rowspan='1'>test</td><td class="sugg">http://test</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>
<tr><td rowspan='2'>http://</td><td class="sugg">http://</td><td class="score"><span style="width: 500px;" class="scorebar">&nbsp;</span> 1</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 375px;" class="scorebar">&nbsp;</span> 0.75</td></tr>

<tr><td rowspan='2'>http:</td><td class="sugg">http://</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 350px;" class="scorebar">&nbsp;</span> 0.7</td></tr>

<tr><td rowspan='4'>http</td><td class="sugg">http://</td><td class="score"><span style="width: 360px;" class="scorebar">&nbsp;</span> 0.72</td></tr>
<tr><td class="sugg">https://</td><td class="score"><span style="width: 355px;" class="scorebar">&nbsp;</span> 0.71</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 340px;" class="scorebar">&nbsp;</span> 0.68</td></tr>
<tr><td class="sugg">http://http</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>

<tr><td rowspan='1'>_test</td><td class="sugg"><i>none</i></td><td class="score">&nbsp;</td></tr>
<tr><td rowspan='1'>hello world!</td><td class="sugg"><i>none</i></td><td class="score">&nbsp;</td></tr></table>

<h3>See you tomorrow~</h3>

<p>Alright, enough nountype wrangling for one day. I&#8217;ll be back again tomorrow for another installment.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>If we could make assumptions about what tags look like, for example that they are always pretty short, or use certain character classes, we could use such factors as well to judge non-preexisting tags for &#8220;tagginess&#8221; but unfortunately it&#8217;s possible (though unlikely) that a user would prefer really long tag strings and of course Firefox allows tags in any unicode code range. The only strings we can immediately rule out as impossible are ones which are purely whitespace.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>It&#8217;s actually unclear whether the method we&#8217;re using (<a href="https://developer.mozilla.org/en/nsIAutoCompleteSearch"><code>nsIAutoCompleteSearch</code></a>) is actually searching titles or not&#8230; it currently looks like it&#8217;s only looking at the URL&#8217;s. Perhaps the title query is what we&#8217;re supposed to enter in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=507315">the mystery parameter</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>I hope to discuss the <code>matchScore</code> function in a separate blog post later.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>While writing up this section I ran into a bug whereby when both <code>noun_type_awesomebar</code> and <code>noun_type_url</code> are active, only one of their async callbacks from <code>Utils.history.search</code> are returned. Thus, if lucky, only one of the nountypes will return the history results and if unlucky the parse query will not complete. Filed as <a href="http://ubiquity.mozilla.com/trac/ticket/845">trac #845</a>.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Permanent Link: Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Judging Noun Types</title>
		<link>http://mitcho.com/blog/projects/judging-noun-types/</link>
		<comments>http://mitcho.com/blog/projects/judging-noun-types/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 06:39:11 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2588</guid>
		<description><![CDATA[Introduction Different arguments are classified into different kinds of nouns in Ubiquity using noun types.1 For example, a string like &#8220;Spanish&#8221; could be construed as a language, while &#8220;14.3&#8221; should not be. These kinds of relations are then used by the parser to introduce, for example, language-related verbs (like translate) using the former argument, and [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>

<p>Different arguments are classified into different kinds of nouns in Ubiquity using <em>noun types</em>.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> For example, a string like &#8220;Spanish&#8221; could be construed as a language, while &#8220;14.3&#8221; should not be. These kinds of relations are then used by the parser to introduce, for example, language-related verbs (like <code>translate</code>) using the former argument, and number-related verbs (like <code>zoom</code> or <code>calculate</code>) based on the latter. Ubiquity nountypes aren&#8217;t exclusive—a single string can count as valid for a number of different nountypes and in particular the &#8220;arbitrary text&#8221; nountype (<code>noun_arb_text</code>) will always accept any string given.</p>

<p>In addition to the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/tip/ubiquity/modules/nountypes.js">various built-in nountypes</a>, Ubiquity lets command authors <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_Source_Tip_Author_Tutorial#Writing_a_Noun_Type_Object">write their own nountypes</a> as well.</p>

<h3>The functions of a noun type</h3>

<p>Nountypes have two functions: the first is <strong>accepting and suggesting</strong> suggestions and the second is <strong>scoring</strong>.</p>

<p><span id="more-2588"></span></p>

<h4>Accepting and suggesting</h4>

<p>Nountypes don&#8217;t just have to accept the exact string they were given—they can also return suggestions which are based on that input. For example, the <code>noun_type_language</code> can take the input &#8220;span&#8221; and return &#8220;Spanish.&#8221; A nountype can return multiple suggestions which may or may not include the trivial suggestion, i.e. the original input as is. If there is no way that that input could possibly be part of an accepted value, it should return no suggestions, i.e. <code>[]</code>.<sup id="fnref:3"><a href="#fn:3" rel="footnote">2</a></sup></p>

<h4>Scoring</h4>

<p>Ubiquity 0.5 with Parser 2 introduced the notion of a nountype suggestion <em>score</em>. For example, two different nountypes can accept the same input, but with different scores. Scores range from 0 to 1 where 1 is a perfect or exact suggestion and 0.1 or so is a very very improbable suggestion.<sup id="fnref:2"><a href="#fn:2" rel="footnote">3</a></sup> These scores are used in the <a href="http://mitcho.com/blog/observation/scoring-for-optimization/">scoring of parses</a>. Because verbs specify certain nountypes for each of their arguments, the scores that individual nountypes return for each argument are a crucial component of the scoring algorithm and can even determine whether a parse is returned or not.</p>

<p>With this in mind, you may be tempted to make your nountype return a score of 1 on any input so your verb will show up in the suggestions highly. While this would work, it will only act to make your verb annoying and a poor Ubiquity citizen. Appropriate scores must be given to noun suggestions, with higher values reflecting confidence and lower values reflecting imprecision. <em>But how exactly do you figure out what&#8217;s an appropriate value?</em></p>

<h3>Judging nountypes with the Nountype Tuner</h3>

<p>The Nountype Tuner is a new tool I&#8217;ve been building to help both Ubiquity core developers and command authors to check their nountypes against others and to &#8220;tune&#8221; their behavior and scores. The nountype tuner will take your input and throw it against all of the nountypes referenced in your active verbs and display the suggestions returned with their scores. You can think of it as <a href="http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/">the Playpen</a>&#8217;s little sister.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner.png" alt="tuner.png" border="0" width="650" height="646" /></p>

<p>The Nountype Tuner can be found at <a href='chrome://ubiquity/content/tuner.html'>chrome://ubiquity/content/tuner.html</a>, though I am pretty sure it is broken in Ubiquity 0.5 and 0.5.1. It has been fixed now and I will make sure it&#8217;s in good shape for 0.5.2.</p>

<p>The heart and soul of the Nountype Tuner is this scale:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></center></p>

<p>This scale tells you, <em>in plain English</em>, what different scores represent and correspond to, in two sets of vocabulary: &#8220;in terms of a guess&#8221; and &#8220;in terms of a match.&#8221; While still subjective, this scale helps developers just different input/output pairs and their scores. For example, &#8220;lian&#8221; → &#8220;http://lian&#8221; is given 0.5, so it&#8217;s an okay guess or a possible match&#8230; does that seem right to you? Or &#8220;lian&#8221; → &#8220;Italian&#8221; being between &#8220;okay&#8221; and &#8220;good.&#8221; Appropriate? We can look at such statements, decide how we feel about them, and tweak if necessary.</p>

<h3>Good nountype scores have roots</h3>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/roots.jpg" alt="roots.jpg" border="0" width="650" height="300" /></p>

<p><small>CC-BY <a href="http://www.flickr.com/photos/aaronescobar/2569091622/">Aaron Escobar</a></small></p>

<p>&#8230;not that kind of root, but more like <a href="http://en.wikipedia.org/wiki/Nth_root">this kind of root</a>&#8230; let me explain&#8230;</p>

<p>When comparing the scores that individual nountypes return for different inputs, we must compare those scores <em>within the same nountype&#8217;s family of suggestions</em> to see if higher scores truly correspond to higher confidence. For example, the language nountype should give the suggestion &#8220;French&#8221; for both the inputs &#8220;f&#8221; and &#8220;fren,&#8221; but the scores of these suggestions should be different—i.e. the score of &#8220;f&#8221; → &#8220;French&#8221; should be much lower than the score for &#8220;fren&#8221; → &#8220;French,&#8221; reflecting the additional informational value. We refer to this relation of the scores of successive prefixes of a single suggestion all returning that same suggestion as the <em>score curve</em> and in general it should be non-decreasing.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup></p>

<p>One could say the most trivial score function then is the linear one. For a series of converging prefixes of the same suggestion (&#8220;Dutch&#8221;), under a linear approach we could naively let the score be (length of the input)/(length of the suggestion), as below:</p>

<p><center></p>

<h4>the linear model</h4>

<table>
<tr><th>input</th><td>d</td><td>du</td><td>dut</td><td>dutc</td><td>dutch</td></tr>
<tr><th>output</th><td>Dutch</td><td>Dutch</td><td>Dutch</td><td>Dutch</td><td>Dutch</td></tr>
<tr><th>score</th><td>0.2</td><td>0.4</td><td>0.6</td><td>0.8</td><td>1</td></tr>
</table>

<p></center></p>

<p>This linear model is represented below by the black line.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/nth-roots.png" alt="nth-roots.png" border="0" width="299" height="262" /></center></p>

<p>The problem with the linear model is that earlier transitions (additional keystrokes) <em>add more information</em> than the later ones. Once we&#8217;ve entered &#8220;dutc,&#8221; after all, we would like to be <em>pretty darn sure</em> that we mean &#8220;Dutch,&#8221; so the score difference between &#8220;dutc&#8221; and &#8220;dutch&#8221; should be less than the score difference between, say, &#8220;d&#8221; and &#8220;du.&#8221; We want a score curve that looks more like the solid or dotted red lines above.</p>

<p>For this reason, <strong>I strongly advocate the incorporation of an <em>n</em>th-root in the score computation</strong>. <em>N</em>th-rooted score functions over [0,1] have the feature that they are increasing but also that earlier transitions affect the score more than later ones, which is exactly what we&#8217;d like to see. (The solid red line above is <code>x^1/2</code> and the dotted one is <code>x^1/3</code>.)<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup></p>

<h3>Conclusion</h3>

<p>Properly tuning both the built-in nountypes and custom nountypes is crucial to producing more accurate and relevant parse suggestions. I&#8217;ll be using the principles and criteria laid out above, combined with the new Nountype Tuner, to <a href="http://ubiquity.mozilla.com/trac/ticket/746">tune the built-in nountypes (trac #746)</a> in the coming days in preparation for our <a href="http://tinyurl.com/lgekyh">0.5.2 release</a>. I invite you to use the Nountype Tuner in 0.5.2 to tune your custom nountypes as well.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Or, as I often write them, &#8220;nountypes.&#8221;&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Note that I didn&#8217;t just say &#8220;if the input is not an accepted value&#8230;&#8221; That&#8217;s because, based on the left-to-right nature of text input, an argument may later become a valid input of a certain nountype with a few more keystrokes. For example, if we had a URL nountype which accepted &#8220;http://mitcho.com&#8221; but not &#8220;http://mitcho&#8221;, any command which took this nountype would not show up in the suggestions while we were typing out &#8220;http://mitcho&#8221;&#8230; but would suddenly appear when we completed the &#8220;.com&#8221;. The best practice here is to suggest a valid value for the initial &#8220;http://mitcho&#8221;, like &#8220;http://mitcho.com&#8221;.<br/>(In reality, I should have said &#8220;initial-to-later nature&#8221; to be fair to right-to-left languages, but you get the idea. Speaking of which, serious consideration of Ubiquity in right-to-left languages is long overdue.)&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>In reality, due to the way parses are scored and the fact that <code>noun_arb_text</code> accepts anything with score 0.3, a suggestion with score below 0.3 is probably not worth even giving out. Notable exceptions are for custom noun types which are used in commands which take multiple arguments&#8230; in these cases, even scores below 0.3 could add up and overtake a <code>noun_arb_text</code> parse, but it&#8217;s rare.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>The idea that successively longer inputs should yield successively higher scores only makes sense (1) when they are converging on the same suggestion output and (2) when these are truly suggestions, not just acceptances. For nountypes which accept the input verbatim, suggestion scores need not increase&#8230; for example &#8220;1&#8221; is just as good a &#8220;number&#8221; as &#8220;1234&#8221; is, so both of their respective suggestions, &#8220;1&#8221; and &#8220;1234&#8221; could be given the same score.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:5">
<p>Unfortunately the Nountype Tuner currently only compares the suggestions of <em>one input</em> across a number of nountypes, not a number of inputs across the same nountype. In the future I&#8217;d like to make the Nountype Tuner be able to produce these sorts of score curves as well.&#160;<a href="#fnref:5" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-3/' rel='bookmark' title='Permanent Link: Nountype Quirks: Day 3: Geo Day'>Nountype Quirks: Day 3: Geo Day</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/judging-noun-types/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</title>
		<link>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/</link>
		<comments>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/#comments</comments>
		<pubDate>Mon, 11 May 2009 05:19:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[portmanteau]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2019</guid>
		<description><![CDATA[The problem: In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/solving-another-romantic-problem/' rel='bookmark' title='Permanent Link: Solving Another Romantic Problem: Weak Pronouns'>Solving Another Romantic Problem: Weak Pronouns</a></li>
<li><a href='http://mitcho.com/blog/projects/inside-the-argument/' rel='bookmark' title='Permanent Link: Inside the Argument'>Inside the Argument</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Permanent Link: Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>The problem:</h3>

<p>In many <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a>, prepositions and articles often form <a href="http://en.wikipedia.org/wiki/portmanteau">portmanteau morphs</a>, combining to form a single word.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau&#8217;ed prepositions and articles&#8230; I refer you to the <a href="http://en.wikipedia.org/wiki/Contraction (grammar)#Italian">contraction</a> article on Wikipedia.</p>

<p>As I <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">noted a couple weeks ago</a>, however, some combinations do not form portmanteaus.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>

<p><span id="more-2019"></span>
<strong>French:</strong></p>

<ol>
<li>à + le > au</li>
<li>à + la > à la</li>
</ol>

<p>The problem with this is that if we use both <em>à</em> and <em>au</em> as delimiters, we may end up passing the definite article to the verb as part of the argument in some cases, but not in other cases.</p>

<ol>
<li>&#8220;<strong>à</strong> la table&#8221; = &#8220;<strong>to</strong> the table&#8221;</li>
<li>&#8220;<strong>au</strong> chat&#8221; = &#8220;<strong>to the</strong> cat&#8221;</li>
</ol>

<h3>The solution:</h3>

<p>The solution is a new step in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">the Parser 2 process</a> which normalizes the form of arguments. Each language&#8217;s parser can now optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: '', newInput: '', suffix: ''}</code>. For example, if you feed &#8220;la table&#8221; to the French <code>normalizeArgument()</code>, it ought to return</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#123;</span>prefix<span style="color: #339933;">:</span> <span style="color: #3366CC;">'la '</span><span style="color: #339933;">,</span> newInput<span style="color: #339933;">:</span> <span style="color: #3366CC;">'table'</span><span style="color: #339933;">,</span> suffix<span style="color: #339933;">:</span> <span style="color: #3366CC;">''</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#93;</span></pre></div></div>


<p>If there are no possible normalizations, <code>normalizeArgument()</code> should simply return <code>[]</code>. Each alternative returned by <code>normalizeArgument()</code> is substituted into a copy of the possible parses just before nountype detection. The prefixes and suffixes are stored in the argument (as <code>inactivePrefix</code> and <code>inactiveSuffix</code>) so they can be incorporated into the suggestion display.</p>

<p>Here, for example, is how the inactive prefix &#8220;l&#8217;&#8221; is displayed in <a href="chrome://parser-demo/content/index.html">the parser demo</a>. This way the user is told that the &#8220;l&#8217;&#8221; prefix is being ignored, and the nountype detection and verb action can act on the argument &#8220;English&#8221;.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> (In the future, of course, we could teach this nountype to accept the Catalan &#8220;anglès&#8221;.)</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-1.png" alt="Picture 1.png" border="0" width="320" height="29" /></center></p>

<p>The easiest way to produce this output is to use the <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/String/match"><code>String.match()</code></a> method. For example <code>normalizeArgument()</code> code, I refer you to the <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/ca.js">Catalan</a> and <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/fr.js">French</a> parser files.</p>

<p>I hope that this solution will help make Ubiquity with Parser 2 feel <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for many romance languages.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>Thanks to <a href="http://people.ucsc.edu/~jpobrien/">Jeremy O&#8217;Brien</a> for helping me figure out how to refer to this phenomenon.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This also relates to the issue of <a href="http://ubiquity.mozilla.com/trac/ticket/671">parsing multi-word delimiters</a>, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Thank you to contributor <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a> for our first attempt at a Catalan parser!&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/solving-another-romantic-problem/' rel='bookmark' title='Permanent Link: Solving Another Romantic Problem: Weak Pronouns'>Solving Another Romantic Problem: Weak Pronouns</a></li>
<li><a href='http://mitcho.com/blog/projects/inside-the-argument/' rel='bookmark' title='Permanent Link: Inside the Argument'>Inside the Argument</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Permanent Link: Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Adding Your Language to Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 11:44:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case marking]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[semantic roles]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1956</guid>
		<description><![CDATA[NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions. You&#8217;ve seen the video. You speak another language. And you&#8217;re wondering, &#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221; The answer: not that hard. With [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">these instructions</a>.</strong></p>

<p>You&#8217;ve <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">seen the video</a>. You speak another language. And you&#8217;re wondering, <strong>&#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221;</strong> The answer: <strong>not that hard.</strong> With a little bit of JavaScript and knowledge of and interest in your own language, you&#8217;ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please <a href="http://ubiquity.mozilla.com/trac/ticket/662">submit your (even incomplete) language files</a>!</p>

<p><em>As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the <a href="http://ubiquity.mozilla.com/planet/">Ubiquity Planet</a> and/or <a href="http://mitcho.com/blog/">this blog</a> (<a href="http://mitcho.com/blog/feed/blog-only/">RSS</a>).</em></p>

<p><span id="more-1956"></span></p>

<h3>Set up your environment</h3>

<p>If you&#8217;re new to Ubiquity core development, you&#8217;ll want to first read the <a href="http://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">Ubiquity 0.1 Development Tutorial</a> to learn how to get a live copy of the Ubiquity repository using <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>. Once you&#8217;ve set up your Firefox profile to use this development version, make sure to try changing the <code>extensions.ubiquity.parserVersion</code> value to 2 in <code>about:config</code> (as seen in <a href="(http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/)">this demo video</a>) to verify that Parser 2 is working for you.</p>

<p>As you read along, you may find it beneficial to follow along in the languages currently included in Parser 2: <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/en.js">English</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/pt.js">Portuguese</a>, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/sv.js">Swedish</a> (and the incomplete <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a> and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/fr.js">French</a>).</p>

<h3>The structure of the language file</h3>

<p>Each language in Parser 2 gets its own file which acts as a <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>. You&#8217;ll need to look up the <a href="http://en.wikipedia.org/wiki/List of ISO 639-1 codes">ISO 639-1 code for your language</a>&#8230; Here we&#8217;ll use English (code <code>en</code>) as an example here and the JavaScript language file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.</p>

<p>Here is the basic template for a Ubiquity Parser 2 language file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> EXPORTED_SYMBOLS <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;makeEnParser&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">typeof</span> window<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #3366CC;">'undefined'</span><span style="color: #009900;">&#41;</span> <span style="color: #006600; font-style: italic;">// kick it chrome style</span>
  Components.<span style="color: #660066;">utils</span>.<span style="color: #003366; font-weight: bold;">import</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;resource://ubiquity/modules/parser/new/parser.js&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> makeEnParser<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> en <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Parser<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'en'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
...
&nbsp;
  <span style="color: #000066; font-weight: bold;">return</span> en<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>


<p>After lines 1-4 which set up the <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>, everything else is wrapped in a factory function called <code>makeLaParser</code> (for Latin) or <code>makeEnParser</code> (for English, <code>en</code>) or <code>makeFrParser</code> (for French, <code>fr</code>), etc. This function initializes the new <code>Parser</code> object (line 7) with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That&#8217;s it!</p>

<p>Now let&#8217;s walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.</p>

<h3>Identifying your branching parameter</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">branching</span> <span style="color: #339933;">=</span> <span style="color: #3366CC;">'right'</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// or 'left'</span></pre></div></div>


<p>One of the first things you&#8217;ll have to set for your parser is <strong>the <code>branching</code> parameter</strong>. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or &#8220;role marker&#8221; (most often, these are <a href="http://en.wikipedia.org/wiki/adposition">prepositions or postpositions</a>. For example, in English &#8220;from&#8221; is a delimiter for the <code>goal</code> role and its argument is on its right.</p>

<table>
<tr><td>&nbsp;</td><td>&nbsp;</td><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-right.png) no-repeat right bottom'>&nbsp;</td></tr>
<tr><td><b>to</b></td><td>Mary</td><td><b>from</b></td><td>John</td></tr>
</table>

<p>So &#8220;John&#8221; is a possible argument for the <code>source</code> role, but &#8220;Mary&#8221; should not be. Ubiquity can figure this out because English has the property <code>en.branching = 'right'</code>.</p>

<p>In Japanese, on the other hand, the argument of a delimiter like から (&#8220;from&#8221;) is found on the left of that delimiter, so <code>en.branching = 'left'</code>.</p>

<table>
<tr><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-left.png) no-repeat left bottom'>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>メアリー</td><td><b>-から</b></td><td>ジョン</td><td><b>-に</b></td></tr>
<tr><td>Mary</td><td><b>from</b></td><td>John</td><td><b>to</b></td></tr>
</table>

<p>In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.</p>

<p><strong>For more info</strong>:</p>

<ul>
<li>see <a href="http://en.wikipedia.org/wiki/Branching (linguistics)">branching</a> on Wikipedia.</li>
</ul>

<h3>Defining your roles</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">roles</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'goal'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'to'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'source'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'from'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'at'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'on'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'alias'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'as'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'using'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'with'</span><span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a> and a corresponding delimiter. Note that this mapping can be <a href="http://en.wikipedia.org/wiki/many-to-many (data model)">many-to-many</a>, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a>.</p>

<p><strong>For more info:</strong></p>

<ul>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the proposed inventory of semantic roles</a></li>
<li>Wikipedia entry on <a href="http://en.wikipedia.org/wiki/thematic relations">thematic relations</a></li>
</ul>

<h3>Entering your anaphora (&#8220;magic words&#8221;)</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">anaphora</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;this&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;that&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;it&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;selection&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;him&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;her&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;them&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The final required property is the <code>anaphora</code> property which takes a list of &#8220;magic words&#8221;. Currently there is no distinction between all the different <a href="http://en.wikipedia.org/wiki/deixis">deictic</a> <a href="http://en.wikipedia.org/wiki/anaphora (linguistics)">anaphora</a> which might refer to different things.</p>

<h3>Special cases</h3>

<p>Some special language features can be handled by overriding the default behavior from <code>Parser</code>. Many of these features are still in the works, however, so we&#8217;d love to get your comments!</p>

<h4>Languages with no spaces</h4>

<p>If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> function and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a> or <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a>.</p>

<h4>Case marking languages</h4>

<p><strike>If you have a strongly <a href="http://en.wikipedia.org/wiki/grammatical case">case-marked</a> language, you&#8217;ll have to write some rules to identify those different cases in <code>wordBreaker()</code> and then add some extra <code>roles</code> for these case markers, but for a number of languages the current design does not allow an elegant solution for parsing such arguments. Updates to this issue will be posted to <a href="http://ubiquity.mozilla.com/trac/ticket/663">this trac ticket</a>.</p>

<p>In the mean time, however, if you could write a parser even with only the prepositions/postpositions in your language, that would be a great benefit in getting started in your language.</strike> <strong>UPDATE</strong>: a proposal on how to deal with strongly case-marked languages has been written here: <a href="http://mitcho.com/blog/projects/in-case-of-case/">In Case of Case&#8230;</a>.</p>

<h4>Stripping articles</h4>

<p>Some languages have some delimiters which combine with articles. For example, in French, the preposition &#8220;à&#8221; combines with the masculine definite article &#8220;le&#8221; but not &#8220;la&#8221;:</p>

<ol>
<li>à + la = à la</li>
<li>à + le = au</li>
</ol>

<p>You can add both &#8220;à&#8221; and &#8220;au&#8221; as delimiters of the <code>goal</code> role, but then you will get feminine arguments back with the determiner (e.g. &#8220;la table&#8221;) while masculine arguments would be parsed without a determiner (e.g. &#8220;chat&#8221;).</p>

<ol>
<li>&#8220;<b>à</b> la table&#8221; = &#8220;<b>to</b> the table&#8221;</li>
<li>&#8220;<b>au</b> chat&#8221; = &#8220;<b>to the</b> cat&#8221;</li>
</ol>

<p><strike>One possible solution to this is to write a custom <code>cleanArgument()</code> method. After arguments have been parsed and placed in their appropriate roles, each argument text (say, &#8220;la table&#8221; or &#8220;chat&#8221;) are passed to <code>cleanArgument()</code>. You can simply write a <code>cleanArgument()</code> to strip off any &#8220;la &#8221; at the beginning of the input and return it and both example inputs will get normalized arguments: &#8220;table&#8221; and &#8220;chat&#8221;, respectively.</strike> <strong>UPDATE</strong>: For more up-to-date information on how to deal with these types of articles, please see <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">Solving a Romance Problem</a>.</p>

<h3>Test your parser</h3>

<p>Now you can go into <code>about:config</code> and change <code>extensions.ubiquity.language</code> to be your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> If you run into any trouble, feel free to ask for help on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity i18n listhost</a> or find me on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you&#8217;re at a good stopping point, please <a href="http://ubiquity.mozilla.com/trac/ticket/662">contribute your language file to Ubiquity</a>!</p>

<h3>More to come&#8230;</h3>

<p>At this point, you&#8217;ve only localized the <a href="http://en.wikipedia.org/wiki/argument structure">argument structure</a> of your language&#8230; additional work will be required to localize the nountypes and verb names, which is <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">the subject of ongoing discussion</a>&#8230; <a href="http://groups.google.com/group/ubiquity-i18n">join the Google Group</a> to get in on the discussion!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>At this point in time it&#8217;s also possible to test your parser at <code>chrome://parser-demo/content/index.html</code> if you make a couple other changes to your code&#8230; for more information, watch the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Foxkeh demos Ubiquity Parser TNG</a> video. This option gives you more debug info as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Demonstration of Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 06:45:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1845</guid>
		<description><![CDATA[Here&#8217;s a quick demonstration of Ubiquity Parser 2, aka &#8220;the new parser.&#8221; I&#8217;ll show you how you can use the parser yourself and point out some highlights of the new functionality. Ubiquity Parser 2: better noun-first suggestions and command localization from mitcho on Vimeo. Testing Parser 2 requires the latest Ubiquity source, as explained here. [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quick demonstration of Ubiquity Parser 2, aka &#8220;the new parser.&#8221; I&#8217;ll show you how you can use the parser yourself and point out some highlights of the new functionality.</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=4307110&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=4307110&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/4307110">Ubiquity Parser 2: better noun-first suggestions and command localization</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p><span id="more-1845"></span></p>

<p>Testing Parser 2 requires the latest Ubiquity source, as explained <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">here</a>. If you find any problems or suggestions, please add a ticket to <a href="http://ubiquity.mozilla.com/trac/">our trac</a> with the keyword <code>new-parser</code>.</p>

<p>Here are some resources for those of you who would like to read more about different features touched on in this video:</p>

<ul>
<li><a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">The design document for the new parser</a></li>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a> and <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">a proposed inventory of semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">Some thoughts on noun-first suggestions and Ubiquity in Japanese</a></li>
</ul>

<p>In the near future we&#8217;ll also be writing up some documentation on how to take advantage of this new parser in your commands as well.</p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Attachment Ambiguity—or—when is the gyudon cheap?</title>
		<link>http://mitcho.com/blog/observation/attachment-ambiguity/</link>
		<comments>http://mitcho.com/blog/observation/attachment-ambiguity/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 06:17:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[observation]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[attachment ambiguity]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[Japanese culture]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[syntax]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1815</guid>
		<description><![CDATA[Every day on the way to work I walk by a fine establishment known as Yoshinoya (吉野家), Japan&#8217;s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by gyudon, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='Permanent Link: User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Permanent Link: Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Permanent Link: Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya.jpg" alt="yoshinoya.jpg" border="0" width="650" height="328" /></p>

<p>Every day on the way to work I walk by a fine establishment known as <a href="http://en.wikipedia.org/wiki/Yoshinoya">Yoshinoya</a> (吉野家), Japan&#8217;s largest <em>gyudon</em> (牛丼) chain restaurant. For those of you whose lives have yet to be graced by <a href="http://en.wikipedia.org/wiki/gyudon">gyudon</a>, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.</p>

<p>Part of this sign, though, made me think about our <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">new Ubiquity parser</a>. In particular, it was the <strong>attachment ambiguity</strong> in the end date of the promotion. The text in the photo above literally is &#8220;April 15th (Wed.) 8PM until&#8221;. (Note that Japanese is a strongly head-final language, and that the &#8220;until&#8221; is a postposition.) There are two possible readings for this expression, as illustrated by the two <a href="http://en.wikipedia.org/wiki/principle of compositionality">composition</a> trees below.</p>

<p><span id="more-1815"></span></p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya-trees.jpg" alt="yoshinoya-trees.jpg" border="0" width="658" height="157" /></center></p>

<p>The first tree, on the left, represents the reading &#8220;until (April 15th 8PM)&#8221;, while the second represents two arguments: &#8220;on April 15th&#8221; and &#8220;until 8PM&#8221;. In other words, in the first reading, the promotion begins at some earlier date and extends until April 15th at 8PM while, in the second reading, the promotion is one day only, on April 15th, until 8pm. Such syntactic ambiguities are called &#8220;attachment ambiguities&#8221; in linguistics as it is an ambiguity of where different arguments &#8220;attach&#8221; in a tree representation.</p>

<p>This attachment ambiguity was possible because there was no clear <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marker</a> on &#8220;April 15th,&#8221; which may have disambiguated it as &#8220;on April 15th&#8221;. In fact, in many languages this time position argument comes with no case marker or preposition, or it&#8217;s optional, making parsing for them difficult. If such a sentence is entered with spaces, the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Ubiquity Parser: The Next Generation</a> would try a parse where &#8220;8PM&#8221; is the &#8220;until&#8221; or <code>goal</code> argument and &#8220;April 15th&#8221; is an <code>object</code> argument, but it will only check its noun type, not put it in <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the correct semantic role</a> (<code>position</code>). Perhaps this is something to think about in the future.</p>

<p>These types of situations will surely come up as we continue work on the Ubiquity parser, making it essential to look at different languages. <strong>Are there certain kinds of arguments in your language that do not have any word-external markers such as case or prepositions/postpositions?</strong></p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='Permanent Link: User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Permanent Link: Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Permanent Link: Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/attachment-ambiguity/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Rolling out the Roles</title>
		<link>http://mitcho.com/blog/projects/rolling-out-the-roles/</link>
		<comments>http://mitcho.com/blog/projects/rolling-out-the-roles/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 07:07:27 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1763</guid>
		<description><![CDATA[Jono and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands with semantic roles. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Jono and I have recently been working to incorporate the <a href="http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/">Parser The Next Generation</a> into Ubiquity proper, and this of course involves the process of <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">retooling the standard commands with semantic roles</a>. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use and individual languages&#8217; parsers will be built to identify. Today I have just such a proposal.</p>

<p><span id="more-1763"></span></p>

<h3>Something to consider&#8230;</h3>

<p>As we rewrite these current commands to specify semantic roles instead of specific modifiers, it is important to distinguish between synonymous prepositions in English which actually map to different semantic roles. Here are two examples:</p>

<ul>
<li><code>with</code>: English &#8220;with&#8221; can refer to one of two relations: &#8220;together-with&#8221; as in &#8220;share this with Jono&#8221; and &#8220;using-with&#8221; as in &#8220;share this with delicious&#8221; or &#8220;eat this with a fork.&#8221;</li>
<li><code>in</code>: &#8220;in&#8221;, similarly, can refer to two different relations: &#8220;location-in&#8221; as in &#8220;find mexican food in Tokyo&#8221; and &#8220;format-in&#8221; as in &#8220;search Moscow in Russian&#8221; or &#8220;save this page in PDF.&#8221;</li>
</ul>

<p>A quick test for such cases is &#8220;would these markers translate to the same markers in a different language?&#8221; It&#8217;s easy to find a language where the two different &#8220;with&#8221;s and the two different &#8220;in&#8221;s are expressed using different words. <em>With semantic roles in Parser TNG, it&#8217;s okay for multiple semantic roles to share the same delimiters/markers.</em></p>

<h3>A proposed set of semantic roles</h3>

<p>Here is a set of semantic roles which I would like to propose. <em>Keep in mind that these roles should map to morphological features in languages, not necessarily to the type of content in the argument (which is why we also will keep the noun types).</em></p>

<ul>
<li><code>object</code>: direct object (the default or unmarked argument)</li>
<li><code>goal</code>: the goal or end point of (metaphorical) movement or transition

<ul>
<li>example: in English, arguments marked by &#8220;to&#8221;, &#8220;into&#8221;, &#8220;toward&#8221;, etc.</li>
</ul></li>
<li><code>source</code>: the source or starting point of (metaphorical) movement or transition<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>

<ul>
<li>example: in English, arguments marked by &#8220;from&#8221;, &#8220;by&#8221;, etc.</li>
</ul></li>
<li><code>position</code>: refers to a (metaphorical) location which defines the scope of an action, in contrast to <code>goal</code> and <code>source</code>.

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221;, &#8220;at&#8221;, &#8220;near&#8221;, etc.</li>
</ul></li>
<li><code>instrument</code>: a tool or intermediary to be used 

<ul>
<li>example: in English, arguments marked by &#8220;using&#8221; or &#8220;with&#8221;, as in &#8220;bookmark this with delicious.&#8221;</li>
</ul></li>
<li><code>format</code>: describes the intended or expected form of the result

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221; as in &#8220;in PDF form&#8221; or &#8220;in German&#8221;</li>
</ul></li>
<li><code>alias</code>: a name or reference to 

<ul>
<li>example: in English, arguments marked by &#8220;as&#8221; as in &#8220;tag this as new&#8221; or &#8220;login to mail as aza.&#8221;</li>
</ul></li>
</ul>

<p>Note that all three locational roles, <code>goal</code>, <code>source</code>, and <code>location</code> may be used for both times and places as the morphological marking of temporal and spacial expressions are often conflated in language. The appropriate type of referent (time or space) can then be specified with the noun type.</p>

<p>As a quick sanity check of this proposal, here are all the standard feeds built into Ubiquity which have multiple arguments together with what semantic role is appropriate for each argument:</p>

<table>
<thead>
<tr><th>command</th><th>current modifier</th><th>semantic role</th>
</thead>
<tbody style='font-family: monospace'>
<tr><th>convert</th><td>to</td><td>goal, format</td></tr>
<tr><th>email</th><td>to</td><td>goal</td></tr>
<tr><th rowspan='2'>translate</th><td>to</td><td>goal, format</td></tr>
<tr><td>from</td><td>source</td></tr>
<tr><th>search</th><td>with</td><td>instrument</td></tr>
<tr><th>wikipedia</th><td>in</td><td>format</td></tr>
<tr><th>yelp</th><td>near</td><td>position</td></tr>
<tr><th>weather</th><td>in</td><td>location</td></tr>
<tr><th>twitter</th><td>as</td><td>alias</td></tr>
<tr><th rowspan='2'>share-on-delicious</th><td>tagged</td><td>alias</td></tr>
<tr><td>entitled</td><td>alias</td></tr>
</tbody>
</table>

<p>The only problematic standard command, then, is the <code>share-on-delicious</code> command which can take both tags and a title, both of which would most naturally correspond to the <code>alias</code> role. <strong>If you have a suggestion for how best to deal with this type of case, I&#8217;d love to hear your suggestions!</strong></p>

<p>We&#8217;d love to get your feedback to this proposed set of semantic roles. <strong>How do you feel about the proposed set of semantic roles laid out here?</strong> In particular, if you have a command or can envision a command which would like to use a semantic role which does not fit any of these roles or would take multiple arguments of the same role, please let us know! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>The <a href="http://scholar.google.com/scholar?q=&quot;types+of+lexical+information&quot;+fillmore">Filmore (1971)</a> semantic role of &#8220;result&#8221; may also be lumped into this.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/rolling-out-the-roles/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Ubiquity Commands by The Numbers</title>
		<link>http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 03:11:55 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[herd]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1718</guid>
		<description><![CDATA[Recent work in the Ubiquity internationalization realm has focused on the upcoming Ubiquity parser which will bring some great new features to Ubiquity, including support for overlord verbs and semi-automatic localization of commands via semantic roles. It&#8217;s possible, though, that these new features will break backwards compatibility of the current command specification and noun types. [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Recent work in the Ubiquity internationalization realm has focused on the upcoming Ubiquity parser which will bring some great new features to Ubiquity, including support for <a href="http://jonoscript.wordpress.com/2009/01/24/overlord-verbs-a-proposal/">overlord verbs</a> and <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">semi-automatic localization of commands via semantic roles</a>. It&#8217;s possible, though, that these new features will break backwards compatibility of the current command specification and noun types. <a href="http://en.wikipedia.org/wiki/Creative destruction">Creative destruction</a> for the win.</p>

<p>As we look to <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/22fa223f43ef6262">move forward</a> with incorporating <a href="http://mitcho.com/code/ubiquity/parser-demo/">the next generation parser</a> into Ubiquity proper, it thus becomes important to take a look at the current command ecosystem to see how possibly disruptive this move will be. To this end last night I wrote a quick perl script to scrape the commands cached on <a href="http://ubiquity.mozilla.com/herd/">the herd</a> and get some quantitative answers to my questions.</p>

<p><span id="more-1718"></span></p>

<p>(1577 different verbs were analyzed. None of these computations below are weighted by feed popularity.)</p>

<h3>Q: Are there a lot of commands which use more than one argument?</h3>

<p>A: The vast majority (>85%) of commands take one or no arguments, requiring no modifiers. Only those remaining 15% will require a switch to refer to different arguments by <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">semantic role</a>.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/herdcommands.png" alt="herdcommands.png" border="0" width="500" height="355" /></center></p>

<h3>Q: Do many commands introduce custom noun types?</h3>

<p>A: 147 different noun types (lumping anonymous inline objects as one type) were detected. The vast majority of all <code>takes</code> (direct object) arguments were of type <code>noun_arb_text</code>, although many <code>modifiers</code> arguments used custom noun types. The other standard (built-in) noun types are well represented as well, with <code>noun_type_language</code> coming in at second place. Here&#8217;s a chart with all the noun types which had more than one use.</p>

<div style='overflow-y: auto; max-height: 300px;'><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/herdnountypes1.png" alt="herdnountypes.png" border="0" width="550" height="846" /></center></div>

<h3>Q: Are commands with <code>modifiers</code> using natural-language delimiters?</h3>

<p>A: Most of the modifiers detected were English prepositions such as &#8220;from&#8221;, &#8220;to&#8221;, &#8220;as&#8221;, &#8220;with&#8221;, but other words were also seen such as &#8220;title&#8221;, &#8220;type&#8221;, &#8220;username&#8221;, and &#8220;message&#8221; and even a handful of commands with symbols such as &#8220;@&#8221;, &#8220;>&#8221;, or &#8220;#&#8221;.</p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ubiquity Parser: The Next Generation Demo</title>
		<link>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 03:13:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[California]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[overlord verbs]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[photo]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb-final]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1590</guid>
		<description><![CDATA[A week or two ago while visiting California, Jono and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in Ubiquity Parser: The Next Generation. The new architecture is designed to support (1) the use of overlord verbs, (2) writing verbs by semantic roles, and (3) [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><a href='http://mitcho.com/blog/wp-content/uploads/2009/03/parserdesign.jpg' rel='lightbox[parser]'><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/parserdesign.jpg" alt="parserdesign" title="parserdesign" width="600" height="450" class="limages" /></a></p>

<p>A week or two ago while visiting California, <a href="http://jonoscript.wordpress.com">Jono</a> and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Ubiquity Parser: The Next Generation</a>. The new architecture is designed to support (1) the use of <a href="http://jonoscript.wordpress.com/2009/01/24/overlord-verbs-a-proposal/">overlord verbs</a>, (2) <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">writing verbs by semantic roles</a>, and (3) better suggestions for <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a> and other argument-first contexts. I&#8217;m happy to say that I&#8217;ve spent some time putting a proof-of-concept together.</p>

<p>I&#8217;ve implemented the basic algorithm of this parser for <a href="http://en.wikipedia.org/wiki/left-branching">left-branching</a> languages (like English) and also implemented some fake English verbs, noun types, and semantic roles. This demo should give you a basic sense of how this parser will attempt to identify different types of arguments and check their noun types even without clearly knowing the verb. This should make the suggestion ranking much smarter, particularly for verb-final contexts. (For a good example, try <code>from Tokyo to San Francisco</code>.)</p>

<h3><a href="http://mitcho.com/code/ubiquity/parser-demo/">➔ Check out the Ubiquity next-gen parser demo</a></h3>

<p><span id="more-1590"></span></p>

<p>Clicking on the <em>environment info</em> will give you some information on the specific verbs, noun types, and roles implemented. You can also scroll through the <em>current parse</em> section to see the step by step derivation of how the suggested parses were constructed.</p>

<p>I&#8217;ll be flying about 15 hours in the next hour as I make my way back to Japan&#8230; hopefully I&#8217;ll make some more progress on the plane! I&#8217;ll look forward to your comments! <em>For those of you interested in checking out the code yourself, you can find it on <a href="http://bitbucket.org/mitcho/ubiquity-playground/">BitBucket</a>.</em></p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>User-Aided Disambiguation: a demo</title>
		<link>http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/</link>
		<comments>http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/#comments</comments>
		<pubDate>Sat, 14 Mar 2009 06:08:24 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[ambiguity]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[natural syntax]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1572</guid>
		<description><![CDATA[A few weeks ago I made some visual mockups of how Ubiquity could look and act in Japanese. Part of this proposal was what I called &#8220;particle identification&#8221;: that is, immediate in-line identification of delimiters of arguments, which can be overridden by the user: The inspiration for this idea came from Aza&#8217;s blog post &#8220;Solving [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Permanent Link: Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I made some visual mockups of <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">how Ubiquity could look and act in Japanese</a>. Part of this proposal was what I called &#8220;particle identification&#8221;: that is, immediate in-line identification of delimiters of arguments, which can be overridden by the user:</p>

<p><center><img src='http://mitcho.com/blog/wp-content/uploads/2009/02/particle-id.png'/></center></p>

<p>The inspiration for this idea came from Aza&#8217;s blog post <a href="http://www.azarask.in/blog/post/solving-the-it-problem/">&#8220;Solving the &#8216;it&#8217; problem&#8221;</a> which advocates for this type of quick feedback to the user in cases of ambiguity. Such a method would help both the user better understand what is being interpreted by the system, as well as offer an opportunity for the user to correct improper parses. I just tried mocking up such an input box using <a href="http://jquery.com">jQuery</a>.</p>

<h3>➔ <a href='http://mitcho.com/code/ubiquity/ambiguity-demo/'>Try the User-Aided Disambiguation Demo</a></h3>

<p>If you have any bugfixes to submit or want to play around with your own copy, the demo code is <a href="http://bitbucket.org/mitcho/ubiquity-parser-tng/">up on BitBucket</a>. ^^ Let me know what you think!</p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Permanent Link: Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Writing commands with semantic roles</title>
		<link>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/</link>
		<comments>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 08:05:23 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1497</guid>
		<description><![CDATA[Thank you to everyone who contributed data to how your language identifies its arguments! The data collection is ongoing so please contribute data points for languages you know! How Ubiquity identifies its arguments Currently when writing a command in Ubiquity you must specify two properties for each argument: a modifier (the appropriate adposition—the direct object [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Thank you to everyone who contributed data to <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how your language identifies its arguments</a>! The data collection is ongoing so please contribute data points for languages you know!</em></p>

<h3>How Ubiquity identifies its arguments</h3>

<p>Currently <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Author_Tutorial">when writing a command</a> in Ubiquity you must specify two properties for each argument: a modifier (the appropriate <a href="http://en.wikipedia.org/wiki/adposition">adposition</a>—the direct object excluded) and the <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Nountypes_Reference">noun type</a>. Here are some quick examples from the standard commands:</p>

<p><code>email</code>:</p>

<ul>
<li>direct object (<code>noun_arb_text</code>)</li>
<li><code>to</code> (<code>noun_type_contact</code>)</li>
</ul>

<p><code>translate</code>:</p>

<ul>
<li>direct object (<code>noun_arb_text</code>)</li>
<li><code>to</code> (<code>noun_type_language</code>)</li>
<li><code>from</code> (<code>noun_type_language</code>)</li>
</ul>

<p>This way of specifying arguments has a few shortcomings. First of all, it requires you to identify each type of argument by unique adposition, which does not support languages with <a href="http://en.wikipedia.org/wiki/case marking">case marking</a> nor languages with sets of synonymous adpositions (e.g. French {à la, au, aux}). Second, as we saw in <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how your language identifies its arguments</a> some languages don&#8217;t mark semantic roles on the arguments at all and the current system of specifying arguments is completely incompatible with these languages. Third, the current specification requires command authors to make localized versions of their commands, specifying the language-appropriate modifiers.</p>

<p><span id="more-1497"></span></p>

<p>In a perfect world the last issue could be solved (at least for languages which mark semantic roles with adpositions) by a mapping of English prepositions to the target language adpositions. Indeed, for some adpositions in some languages this may be possible:</p>

<table border='0'>
<tr><th colspan='2'>English/Ubiquity</th><th>Chinese</th><th>Japanese</th></tr>
<tr><td>to</td><td rowspan='2'>=></td><td>到 (dào)</td><td>-に (-ni)</td></tr>
<tr><td>from</td><td>从 (cóng)</td><td>-から (-kara)</td></tr>
</table>

<p>However, some English prepositions do not cleanly map to a particular adpositions. Take, for example, English &#8220;with.&#8221; This &#8220;with&#8221; may map to different markings in Chinese and Japanese depending on the sentence:</p>

<table border='0'>
<tr><th colspan='2'>English</th><th>Chinese</th><th>Japanese</th></tr>
<tr><td>share <strong>with</strong> Jono</td><td rowspan='2'>=></td><td>跟 (gēn)</td><td>-と (-to)</td></tr>
<tr><td>translate <strong>with</strong> Google</td><td>用 (yòng)</td><td>-で (-de)</td></tr>
</table>

<p>Note, however, that which set of markings &#8220;with&#8221; maps to is predictable, as there is a salient semantic difference. The first &#8220;with&#8221; could be referred to as <em>together-with</em> while the second is a <em>using-with</em>. With this distinction, we can easily predict which paradigm the &#8220;with&#8221; in &#8220;search <strong>with</strong> Google&#8221; should use, because these two &#8220;with&#8221; arguments represent two different <em>semantic roles</em>.</p>

<h3>A proposal: identifying arguments by semantic role<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></h3>

<p>Suppose commands could specify their arguments by referring to these <em>semantic roles</em> in lieu of adpositions as they currently do. This way, we would be able to automatically map commands into different languages. For example, you could write a new command called <code>move</code> with the following argument structure:</p>

<p><code>move</code>:</p>

<ul>
<li><code>role_object</code> (<code>noun_arb_text</code>)</li>
<li><code>role_goal</code> (<code>noun_type_geolocation</code>)</li>
<li><code>role_source</code> (<code>noun_type_geolocation</code>)</li>
</ul>

<p>The English mapping of &#8221; => <code>role_object</code>, &#8216;to&#8217; => <code>role_goal</code>, &#8216;from&#8217; => <code>role_source</code> could be used to parse the command</p>


<div class="wp_syntax"><div class="code"><pre class="english" style="font-family:monospace;">move truck from Tokyo to Paris</pre></div></div>


<p>In addition, with the Japanese mapping of &#8216;が&#8217; => <code>role_object</code>, &#8216;に&#8217; => <code>role_goal</code>, &#8216;から&#8217; => <code>role_source</code>, you could immediately use the command in Japanese as well:</p>


<div class="wp_syntax"><div class="code"><pre class="japanese" style="font-family:monospace;">東京からパリにトラックをmoveして</pre></div></div>


<p>In essence, this proposal would let command authors get their commands localized <em>for free</em>, as long as they stick to a predefined set of semantic roles. For more complex commands and legacy commands, of course, commands could optionally specify particular English modifiers, but then Ubiquity would simply not attempt to localize those commands.</p>

<p>In addition, each language specific parser would determine how to identify its arguments. This would allow languages with case marking or no role marking on arguments at all to handle their own mapping of arguments to semantic roles and still use shared commands. Even parsers such as English would benefit by letting the parser deal with synonymous prepositions and possibly even argument structure alternations (such as English <a href="http://en.wikipedia.org/wiki/ditransitive alternations">ditransitive alternations</a>).</p>

<p>As a starting point, we could use argument types based on the list of semantic roles given in <a href="http://scholar.google.com/scholar?q=&quot;types+of+lexical+information&quot;+fillmore">Fillmore (1971)</a>:</p>

<ul>
<li>Object: the entity that moves or changes or whose position or existence is in consideration</li>
<li>Result: the entity that comes into existence as a result of the action</li>
<li>Instrument: the stimulus or immediate physical cause of an event</li>
<li>Source: the place from which something moves</li>
<li>Goal: the place to which something moves</li>
<li>Experiencer: the entity which receives or accepts or experiences or undergoes the effect of an action &#8230;</li>
</ul>

<h3>Comments welcome!</h3>

<p><strong>As command authors and Ubiquity users, how do you feel about this proposal? How might this affect, simplify, or complicate the localization of Ubiquity into your language?</strong> Thank you in advance! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Thank you to <a href="http://jonoscript.wordpress.com">Jono</a> and <a href="http://theunfocused.net/">Blair</a> whose comments in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-02-23_i18n_Meeting">our i18n meeting</a> helped shape this proposal.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Permanent Link: Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Permanent Link: Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Firefox: Focus on Japanese</title>
		<link>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/#comments</comments>
		<pubDate>Fri, 20 Feb 2009 11:08:14 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[mockup]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1466</guid>
		<description><![CDATA[One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I&#8217;m going to start exploring what [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-portuguese/' rel='bookmark' title='Permanent Link: Ubiquity in Portuguese'>Ubiquity in Portuguese</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>One of the eventual goals of the <a href="http://ubiquity.mozilla.com">Ubiquity project</a> is to bring some of its functionality and ideas to Firefox proper. To this end, <a href="http://azarask.in">Aza</a> has been exploring some possible options for what that would look like (<a href="http://www.azarask.in/blog/post/ubiquity-in-firefox-round-1/">round 1</a>, <a href="http://www.azarask.in/blog/post/ubiquity-in-the-firefox-round-2/">round 2</a>). All of his mockups, however, use English examples. I&#8217;m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let&#8217;s kick this off with my mother tongue, Japanese.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p><em>今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^</em> <strong>日本語でのコメントも大歓迎です！</strong></p>

<p><span id="more-1466"></span></p>

<h3>What commands look like in Japanese</h3>

<p>Japanese is not only just a verb-final language but it is strongly <a href="http://en.wikipedia.org/wiki/head-final">head-final</a>, meaning it has postpositions instead of prepositions, direct objects come before verbs, and adjectives precede nouns. In terms of <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">how it identifies its arguments</a>, every argument has a postposition/case marker (called a <em>particle</em> in the Japanese literature) which marks its role in the sentence.</p>

<p>A couple common particles we&#8217;ll look at in this example include -を (<em>-o</em>) which marks the direct object (accusative case, you might say) and -に (<em>-ni</em>) which acts like English &#8220;to&#8221; (dative case). The example sentence we&#8217;ll look at today is:</p>

<table border='0'>
<tr><td>ケーキを</td><td>ブレアに</td><td>送って</td><td>(ください)</td></tr>
<tr><td><em>kēki-o</em></td><td><em>burea-ni</em></td><td><em>okuʔte</em></td><td><em>kudasai</em></td></tr>
<tr><td>cake.ACC</td><td>Blair.DAT</em></td><td>send.IMP</td><td>&#8220;please&#8221;</td></tr>
<tr><td colspan='4'>&#8220;Please send a cake to Blair.&#8221;</td></tr>
</table>

<p>(Note: ʔ is a <a href="http://en.wikipedia.org/wiki/glottal stop">glottal stop</a>. ACC=accusative, DAT=dative, and IMP=imperative form.)</p>

<p>That final ください is often dropped in very casual speech and, as it adds no new information, we&#8217;ll assume today that the user will not enter it. Finally, Japanese doesn&#8217;t use spaces in their orthography, so the actual input would be &#8220;ケーキをブレアに送って&#8221;.</p>

<h3>Mockup 1: Particle identification</h3>

<p>One of the major hurdles in working with Japanese is that there are no spaces between the words. The natural first step is to split the sentence up into words, but this is a very difficult problem in <a href="http://en.wikipedia.org/wiki/Natural Language Processing">NLP</a> which <a href="http://research.microsoft.com/en-us/projects/japanesenlp/default.aspx">big name research groups</a> actively work on.</p>

<p>Fortunately, however, in <a href="http://www.azarask.in/blog/post/solving-the-it-problem/">&#8220;Solving the &#8216;It&#8217; Problem&#8221;</a> Aza suggests that, when we encounter ambiguity in our input, we can <em>go ask the user</em>. Great minds think alike, and computer scientist <a href="http://en.wikipedia.org/wiki/Jean E. Sammet">Jean E. Sammet</a> suggested the same idea <a href="http://doi.acm.org/10.1145/365230.365274">way back in 1953</a>:</p>

<blockquote>
  <p>Using English [or any other natural language] definitely involves the requirement for the computer (or more accurately its programming system) to query the user about any possible ambiguity.</p>
</blockquote>

<p>Parsing a sentence into words, in the limited context of Ubiquity, is really about identifying the particles which mark the end of each argument. Here&#8217;s a mockup of an application of the Sammet-Raskin Method to this problem:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/particle-id.png" alt="particle-id.png" border="0" /></center></p>

<p><strong>Pros:</strong> This completely takes care of the word-breaking problem, with minimal arbitration from the user. The parser knows <em>exactly</em> what arguments it&#8217;s dealing with and the visual feedback means the user won&#8217;t be surprised by the parse.</p>

<p><strong>Cons:</strong> Most of the particles/postpositions we&#8217;d have to deal with are a single character, so they may show up pretty often within words, in which case it would be quite annoying to have to press escape after each one.</p>

<p>An even smarter system, when wanting to mark a character as a particle, would first check to see that the argument (before the particle) is a valid argument type for that particle. If the check fails, it doesn&#8217;t have to bother with suggesting that character as a particle. This may cut down on the false positives.</p>

<h3>Smart suggestions: what works, what doesn&#8217;t</h3>

<p>One of the key suggestions in Aza&#8217;s mockups include a way to choose the prepositions while entering your arguments, based on the current verb.</p>

<p>For example, here, the <code>translate</code> command accepts a direct object, a <em>to</em>-object, and a <em>from</em>-object, so little <code>to</code> and <code>from</code> markers magically show up on the right side, making the appropriate prepositions (and by extension the appropriate arguments) discoverable. I think this line of thinking is a really good one, at least for English.</p>

<p><center><a class='limages' rel='lightbox[verbfinal]' href='http://farm4.static.flickr.com/3359/3272673947_05b4a21881_o.jpg'><img src='http://farm4.static.flickr.com/3359/3272673947_14b59c2aa1.jpg'></a></center></p>

<p><strong>In a verb-final language, however, you enter the arguments first and then the verb, making this strategy of suggesting appropriate arguments impossible.</strong> Note that in the user-contributed spreadsheet of <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how languages identify their arguments</a> we see that about a quarter of the languages we looked at are verb-final—that is, with Subject-Object-Verb canonical word order.</p>

<p>Instead of seeing this as a disadvantage, however, let&#8217;s see what verb-final order <em>allows</em> us to do.</p>

<h3>Mockup 2: A different kind of suggestion</h3>

<p>Not all verbs allow for every different kind of particle. For example, it doesn&#8217;t make sense to have a -に (<em>-ni</em>, &#8220;to&#8221; or dative) argument for a verb like 検索して (<em>kensaku-shite</em>, &#8220;search for&#8221;). In English we used this to suggest different types of arguments given a specific verb. In a verb-final language, we could do this <em>backwards</em>.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/verb-suggestion.png" alt="verb-suggestion.png" border="0" /></center></p>

<p><strong>Pros:</strong> This makes verbs highly discoverable, given a certain argument structure. For example, if you enter a few arguments, like a direct object, a &#8220;to&#8221; argument, and a &#8220;from&#8221; argument, it&#8217;ll suggest verbs that will do something to an object from somewhere to somewhere else. This way, you can easily try out verbs you didn&#8217;t even know existed. It&#8217;ll only give you verbs appropriate for your arguments, reducing the chance of writing a an infelicitous command.</p>

<p><strong>Cons:</strong> Without knowing what kinds of actions are available, it may be difficult to know what kinds of arguments to enter in the first place. If you have a specific verb or service you want to use it may be counterintuitive or downright tricky to start by guessing the right set of arguments.</p>

<p>In addition, from a technical point of view, this requires much of the prediction algorithms in English Ubiquity to run backwards. Ideally, there would be a closed (predetermined) class of particles and a predefined set of noun types. Verbs would not be able to define their own modifiers and noun classes as easily or freely as they can now.</p>

<h3>Conclusion</h3>

<p>The properties and challenges of Japanese grammar require that we not try to outright copy the English behavior but to think about what really makes sense in that language and that may be an important lesson as we move toward designing a localizable Ubiquity. Please post your questions and criticisms of this design or post your own mockups!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Happy <a href="http://www.un.org/depts/dhl/language/index.html">International Mother Language Day</a>! ^^&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Permanent Link: Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-portuguese/' rel='bookmark' title='Permanent Link: Ubiquity in Portuguese'>Ubiquity in Portuguese</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/feed/</wfw:commentRss>
		<slash:comments>84</slash:comments>
		</item>
		<item>
		<title>Contribute: how your language identifies its arguments</title>
		<link>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/</link>
		<comments>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 09:37:18 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[contribute]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[grammatical relations]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1450</guid>
		<description><![CDATA[Earlier today I blogged on three different strategies languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs. I gathered some data from the fantastic World Atlas of Language Structures to put together a survey of many of the languages on the Internet. For each [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Permanent Link: Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Earlier today <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">I blogged on three different strategies</a> languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs.</p>

<p>I gathered some data from the fantastic <a href="http://wals.info/">World Atlas of Language Structures</a> to put together a survey of many of the languages on the Internet. For each of the languages, I got the canonical word order and whether the language marks the role of its argument on the verb and/or the arguments themselves.</p>

<iframe width='605' height='300' frameborder='0' src='http://spreadsheets.google.com/pub?key=pE-nN92qp_pa5P6YbUOw0HQ&#038;output=html'></iframe>

<p>As you can see, there are a number of data points that are still missing. <strong>Please contribute information on the languages you speak!</strong> You can <a href="http://spreadsheets.google.com/ccc?key=pE-nN92qp_pa5P6YbUOw0HQ">edit the spreadsheet on Google Docs</a>. Thanks!</p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Permanent Link: Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Permanent Link: Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Permanent Link: Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Three ways to argue over arguments</title>
		<link>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/</link>
		<comments>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 03:26:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[agreement]]></category>
		<category><![CDATA[ambiguity]]></category>
		<category><![CDATA[Ancient Greek]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[Chinese]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[grammatical relations]]></category>
		<category><![CDATA[Hungarian]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandarin]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>
		<category><![CDATA[word order]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1413</guid>
		<description><![CDATA[UPDATE: Contribute information on how your language identifies its arguments here. When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Permanent Link: Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Permanent Link: Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>UPDATE: Contribute information on how your language identifies its arguments <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">here</a>.</em></p>

<p>When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He sees Mary.
彼が Maryを 見る。 (Kare-ga Mary-o miru.)</pre></td></tr></table></div>


<p>As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). <strong>How do different languages code for arguments in different roles?</strong> There are, broadly speaking, three different ways:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/threeways.png" alt="three ways to code for arguments in different roles" border="0" width="536" height="284" /></center></p>

<p>We&#8217;ll take a brief look today at these three different strategies, all of which <a href="http://www.azarask.in/blog/post/scaling-ubiquity-to-60-languages-we-need-your-help/">a localizeable natural language interface</a> will surely encounter.</p>

<p><span id="more-1413"></span></p>

<h3>Word order</h3>

<p>In many languages, the position of the arguments relative to one another and to the verb determine the roles which each argument will play. Mandarin Chinese is a good example of such a language:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">他 喜欢 Mary (Ta xihuan Mary)
Mary 喜欢 他 (Mary xihuan ta)</pre></td></tr></table></div>


<p>Here, sentence (3) says &#8220;he likes Mary&#8221; while sentence (4) says &#8220;Mary likes him&#8221;. Simply reversing the positions of &#8220;he/him&#8221; and &#8220;Mary&#8221; we&#8217;re able to flip the roles that they fill in the sentence: that of the person who does the liking and the person who is being liked. Now take a look at sentence (5) which means &#8220;John says &#8216;hello&#8217; to Mary.&#8221;</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">John 告诉 Mary &quot;你 好&quot; (John gaosu Mary &quot;ni hao&quot;)</pre></td></tr></table></div>


<p>We note here that, while in English we used a different strategy of marking one argument (we marked the &#8220;hello&#8221; argument with &#8220;to&#8221;), Chinese doesn&#8217;t mark either of the arguments. There is, however, a clearly defined order to the arguments, which you might encode this way:</p>


<div class="wp_syntax"><div class="code"><pre class="code" style="font-family:monospace;">say [who you're speaking to] [what you're saying]</pre></div></div>


<p>If you swap the order of the two objects in this sentence, it becomes ungrammatical. (<strong>Note:</strong> the asterisk * here means the sentence is <em>ungrammatical</em>.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">* John 告诉 &quot;你 好&quot; Mary (John gaosu &quot;ni hao&quot; Mary)</pre></td></tr></table></div>


<p>Here, the word order dictates that &#8220;你好&#8221; must be &#8220;who you&#8217;re speaking to&#8221; and &#8220;Mary&#8221; must be &#8220;what you&#8217;re saying,&#8221; but that doesn&#8217;t make sense, so the sentence is ungrammatical.</p>

<h3>Marking the arguments</h3>

<p>Another possible strategy is to mark each argument (or some of the arguments) so that each argument&#8217;s role is clear. In many languages this is done with <a href="http://en.wikipedia.org/wiki/case marking">case marking</a>. Take for example this Ancient Greek sentence with its English gloss on line (6). Here, NOM refers to <a href="http://en.wikipedia.org/wiki/nominative case">nominative case</a> and ACC refers to <a href="http://en.wikipedia.org/wiki/accusative case">accusative case</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho  didaskal-os  paideuei to  paidi-on  (SVO)
the teacher -NOM teaches  the boy  -ACC</pre></td></tr></table></div>


<p>This sentence means &#8220;the teacher instructs the boy.&#8221; While sentence (5) is in Subject-Verb-Object order, any of the six possible orderings of {subject, verb, object} are also grammatical and mean the same thing:<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>7
8
9
10
11
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho didaskalos to paidion paideuei (SOV)
paideuei ho didaskalos to paidion (VSO)
paideuei to paidion ho didaskalos (VOS)
to paidion ho didaskalos paideuei (OSV)
to paidion paideuei ho didaskalos (OVS)</pre></td></tr></table></div>


<p>Many languages also use <a href="http://en.wikipedia.org/wiki/adposition">adpositions</a> (prepositions and/or postpositions) to further clarify the role of an argument in addition to case (like English does) or in lieu of case marking altogether. The idea is the same, though: you want to clarify the roles of the arguments so you morphologically mark the arguments with their roles.</p>

<h3>Marking the verb</h3>

<p>Many languages mark the verb with some information about the argument in a certain role, so that we can properly identify the argument&#8217;s roles. This kind of phenomenon is called <em>agreement</em>.</p>

<p>The most common type of verbal agreement is subject agreement, where the verb is marked by a specific form depending on some features of the subject. Anyone who&#8217;s taken French 101 will recognize this verb conjugation paradigm:</p>

<table>
<tr><th></th><th>subject</th><th>être (to be)</th></tr>
<tr><td rowspan='3'>singular</td><td>je (I)</td><td>suis</td></tr>
<tr><td>tu (you)</td><td>es</td></tr>
<tr><td>il/elle (he/she)</td><td>est</td></tr>
<tr><td rowspan='3'>plural</td><td>nous (we)</td><td>sommes</td></tr>
<tr><td>vous (plural you)</td><td>êtes</td></tr>
<tr><td>ils (they)</td><td>sont</td></tr>
</table>

<p>With this paradigm, if you hear or see &#8220;suis&#8221; in a French sentence, you immediately know that &#8220;je&#8221; (<em>I</em>) must be the subject and if you see &#8220;sommes,&#8221; &#8220;nous&#8221; (<em>we</em>) is the subject, etc. <a href="http://en.wikipedia.org/wiki/Standard Average European">Standard Average European</a> languages tend to exhibit this sort of subject-verb agreement.</p>

<p>Features of the subject position aren&#8217;t the only thing that can be marked on the verb, though. Hungarian, for example, has a type of object agreement. Specifically, the verb marks whether the object is definite or not (in linguistics lingo, &#8220;the verb agrees with the object&#8217;s definiteness feature&#8221;).</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
14
15
</pre></td><td class="code"><pre class="hungarian" style="font-family:monospace;">John lát  egy almát.
John sees an  apple
John látja az  almát.
John sees  the apple</pre></td></tr></table></div>


<p>Notice that in sentence (12) (glossed in (13)) the verb for &#8220;see&#8221; is realized as &#8220;lát,&#8221; while in (14) it&#8217;s &#8220;látja.&#8221; A speaker can use that agreement to see whether the object is definite or not and thus limit the possible object arguments out of all the nouns in the sentence.</p>

<h3>All of the above</h3>

<p><a href='http://www.qwantz.com/'><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/whom.gif" alt="whom.gif" border="0" width="650" height="442" /></a></p>

<p>Most languages do not use only one of these strategies, but a combination of them. English is a very good example. In a sentence like (12) below the main coding of grammatical roles seems to be word order alone. By reversing the word order into (13), we can effectively swap the argument&#8217;s roles.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes Mary.
Mary likes John.</pre></td></tr></table></div>


<p>However, this doesn&#8217;t work with pronominal arguments. Swapping the arguments in (14) yields (15) which is ungrammatical due to the case marking on the pronouns.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>14
15
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He likes her.
* Her likes he.</pre></td></tr></table></div>


<p>In addition, the verb in English must agree with the subject&#8217;s number (singular or plural):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>16
17
18
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes them.
* They likes John.
They like John.</pre></td></tr></table></div>


<p>In this way, English exhibits all three strategies: word order, case marking, and agreement, although often only word order is actively used to disambiguate the roles of arguments.</p>

<p><strong>Question:</strong> What strategies are used by your language to mark the roles of different arguments?</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>The following example is from <a href="http://www.personal.uni-jena.de/~x4diho/LingTyp%20Grammatical%20relations.ppt">Holger Diessel</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>&#8220;Mean the same thing&#8221; here means that the teacher is always instructing and the boy is always being instructed. The sentences may differ in when or how they are used depending on which argument is being talked about or what the implications of the utterance are. The formal notion is <em>truth-conditional equivalence</em>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Permanent Link: Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Permanent Link: Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Permanent Link: Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>
