<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; parser</title>
	<atom:link href="http://mitcho.com/blog/tag/parser/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Mashing up the browser in Maine</title>
		<link>http://mitcho.com/blog/projects/mashing-up-the-browser-in-maine/</link>
		<comments>http://mitcho.com/blog/projects/mashing-up-the-browser-in-maine/#comments</comments>
		<pubDate>Sat, 19 Dec 2009 19:00:32 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[travelogue]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Jetpack]]></category>
		<category><![CDATA[Maine]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[slides]]></category>
		<category><![CDATA[talk]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=3233</guid>
		<description><![CDATA[Last week I was invited to give a talk at the TechMaine annual conference in Portland, Maine. Being a longer time slot than I previously have used to talk about Ubiquity, I decided to dedicate a good portion of the talk to Jetpack. Being outside of Mozilla for the past few months, this gave me [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/extending-wordpress-talk-at-the-boston-wordpress-meetup/' rel='bookmark' title='Extending WordPress talk at the Boston WordPress Meetup'>Extending WordPress talk at the Boston WordPress Meetup</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/' rel='bookmark' title='Ubiquity presentation at Tokyo 2.0'>Ubiquity presentation at Tokyo 2.0</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Last week I was invited to give a talk at the <a href="http://www.techmaine.com/ac2009">TechMaine annual conference</a> in Portland, Maine.</p>

<p>Being a longer time slot than I previously have used to talk about Ubiquity, I decided to dedicate a good portion of the talk to <a href="http://jetpack.mozillalabs.com">Jetpack</a>. Being outside of Mozilla for the past few months, this gave me an opportunity to get reacquainted with the Jetpack APIs. I myself was impressed by how easy it was to develop a quick Jetpack. I ended up preparing two to live-code during the talk: one called <a href="http://jetpackgallery.mozillalabs.com/jetpacks/207">Helvetica</a> which, with one click, replaces all fonts on the current page with Helvetica; and <a href="http://jetpackgallery.mozillalabs.com/jetpacks/208">You Are Here</a> which uses an open API from <a href="http://ipinfodb.com/">IPinfoDB</a> to display the physical location of the domain you are currently visiting in the status bar. Both are now on the <a href="http://jetpackgallery.mozillalabs.com/">Jetpack Gallery</a>.</p>

<p><a rel='lightbox' href="http://mitcho.com/blog/wp-content/uploads/2009/12/youarehere.png"><img src="http://mitcho.com/blog/wp-content/uploads/2009/12/youarehere-inset.png" alt="" title="You Are Here" width="464" height="112" class="alignnone size-full wp-image-3237" /></a></p>

<p>Unfortunately there was a bit of a snowstorm leading up to the event, but there was still a nice turnout and I got to meet some fantastic people there. Ken Shoemake of <a href="http://en.wikipedia.org/wiki/slerp">slerp</a> and <a href="http://en.wikipedia.org/wiki/quaternion">quaternion</a> fame came up to me after my talk and said &#8220;the Ubiquity parser reminded me of the dancing bear&#8230; it&#8217;s less surprising that it works well as that it works at all.&#8221; <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I also enjoyed the other great presentations in the technology track, covering the <a href="http://www.nofluffjuststuff.com/conference/speaker/brian_sletten">virtues of REST</a> and basic iPhone development.</p>

<p><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/mitcho/mashup-the-browser-with-ubiquity-and-jetpack" title="Mashup the Browser with Ubiquity and Jetpack">Mashup the Browser with Ubiquity and Jetpack</a><object style="margin:0px" width="600" height="501"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=techmaine-091210174736-phpapp01&#038;stripped_title=mashup-the-browser-with-ubiquity-and-jetpack" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=techmaine-091210174736-phpapp01&#038;stripped_title=mashup-the-browser-with-ubiquity-and-jetpack" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="600" height="501"></embed></object></p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/extending-wordpress-talk-at-the-boston-wordpress-meetup/' rel='bookmark' title='Extending WordPress talk at the Boston WordPress Meetup'>Extending WordPress talk at the Boston WordPress Meetup</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/' rel='bookmark' title='Ubiquity presentation at Tokyo 2.0'>Ubiquity presentation at Tokyo 2.0</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/mashing-up-the-browser-in-maine/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Performance vs Responsiveness —or— How I Made the Parser Twice As Fast in One Day</title>
		<link>http://mitcho.com/blog/projects/performance-vs-responsiveness/</link>
		<comments>http://mitcho.com/blog/projects/performance-vs-responsiveness/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 06:16:54 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[response]]></category>
		<category><![CDATA[responsiveness]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[UI]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2712</guid>
		<description><![CDATA[Since we launched Ubiquity 0.5, the issue of Parser 2 performance has been brought up over and over within the community. By virtue of having a more flexible and localizable design, Parser 2 was expected to be slower than our original parser, but its current implementation felt noticeably—perhaps unnecessarily—slow compared to Parser 1. Parser 2 [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Since we <a href="http://labs.mozilla.com/2009/07/ubiquity-0-5/">launched Ubiquity 0.5</a>, the issue of Parser 2 performance has been brought up <a href="http://groups.google.com/group/ubiquity-firefox/browse_thread/thread/b0dfa649dda77a2c#">over</a> and <a href="http://groups.google.com/group/ubiquity-firefox/browse_thread/thread/13bc9ade35c8b708#">over</a> within the community. By virtue of having a <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">more flexible and localizable design</a>, Parser 2 was expected to be slower than our original parser, but its current implementation felt noticeably—perhaps unnecessarily—slow compared to Parser 1. Parser 2 performance has been identified as <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-08-05_Weekly_Meeting#Notes">one of the blockers</a> for pushing Ubiquity 0.5+ to all of our 0.1.x users, and has thus been one of my recent foci.</p>

<h3>The short story:</h3>

<p>Inspired by some comments by <a href="http://theunfocused.net">Blair</a>, yesterday I was able to make significant (roughly 100%) performance gains in Parser 2, resulting in 40-60% faster parses, depending on the query. This change <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/77156d689b26">has been committed</a> and will be released as part of our forthcoming minor update, Ubiquity 0.5.4. Yay!</p>

<p><span id="more-2712"></span></p>

<h3>The long story: asynchronous parsing</h3>

<p>Given that parsing in Ubiquity, combined with the post-parse of displaying suggestions, takes a good few dozen milliseconds, it is important to make sure it doesn&#8217;t block the main execution thread in order for the UI to stay responsive throughout. In other words, we needed to make it asynchronous.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>When we began work on Parser 2 a few months ago, <a href="http://theunfocused.net">Blair</a> stepped up to the plate to make it run asynchronously. For various reasons, the parser doesn&#8217;t run in a Worker thread for truer threading. Instead, what we did was to put the parser&#8217;s steps into a <a href="https://developer.mozilla.org/en/New_in_JavaScript_1.7#Generators_and_iterators">generator</a> called <code>_yieldingParse</code>. The keyword <code>yield</code> is scattered in points throughout this generator.</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> _yieldingParse<span style="color: #009900;">&#40;</span>...<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #006600; font-style: italic;">// step 1</span>
  ...
  <span style="color: #660066;">yield</span> <span style="color: #003366; font-weight: bold;">true</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #006600; font-style: italic;">// step 2</span>
  ...
  <span style="color: #009900;">&#123;</span>
    ...
    <span style="color: #660066;">yield</span> <span style="color: #003366; font-weight: bold;">true</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
  ...
&nbsp;
<span style="color: #009900;">&#125;</span></pre></div></div>


<p>We then iterate over a <code>_yieldingParse</code> object in a function called <code>doAsyncParse</code>. Each time we go invoke <code>doAsyncParse</code>, it invokes <code>next</code> which advances from the last <code>yield</code> point of the parse to the next one. <code>doAsyncParse</code> checks after each step whether we should <code>keepworking</code> or not and then asynchronously advances to the next step by calling itself with a <code>setTimeout</code>. (Note the code below is a simplification.)</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> parseGenerator <span style="color: #339933;">=</span> _yieldingParse<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> doAsyncParse<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> ok <span style="color: #339933;">=</span> parseGenerator.<span style="color: #660066;">next</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>ok <span style="color: #339933;">&amp;&amp;</span> keepworking<span style="color: #009900;">&#41;</span>
    Utils.<span style="color: #660066;">setTimeout</span><span style="color: #009900;">&#40;</span>doAsyncParse<span style="color: #339933;">,</span><span style="color: #CC0000;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #006600; font-style: italic;">// get this party started</span>
Utils.<span style="color: #660066;">setTimeout</span><span style="color: #009900;">&#40;</span>doAsyncParse<span style="color: #339933;">,</span><span style="color: #CC0000;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The more often we <code>yield</code>, the more responsive the UI would be. However, there is a certain overhead to <code>yield</code>ing each time due to the <code>setTimeout</code>s we call. This point hit home when Blair told me the other day that the parser was much faster without any of the <code>setTimeout</code>s. Indeed, in my own testing running queries completely synchronously (replacing out all the <code>setTimeout</code>s), parses would run in roughly half the original time. However, by virtue of being completely synchronous, the parser would then completely lock up the UI.</p>

<p>I thus set out to strike a balance between performance and responsiveness by taking out and moving some of the <code>yield</code>s in our <code>_yieldingParse</code> (<a href="http://ubiquity.mozilla.com/trac/ticket/856">#856</a>).</p>

<h3>Tests, tests, tests</h3>

<p><a href='http://twitter.com/progrium/status/3273910705'><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/Screen-shot-2009-08-12-at-4.38.50-PM.png" alt="Screen shot 2009-08-12 at 4.38.50 PM.png" border="0" width="641" height="206" /></a></p>

<p>Keeping this in mind, I ran a number of tests as I proceeded with my &#8220;refactoring.&#8221;</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/notes1.jpg" alt="notes.jpg" border="0" width="649" height="327" /></p>

<p>Here are some final parse time results:<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/beforeafter1.png" alt="beforeafter.png" border="0" width="576" height="344" /></p>

<p>Four different queries (&#8220;hello to span&#8221;, &#8220;goo hello&#8221;, &#8220;22.7&#8221;, &#8220;tw as test&#8221; with a selection context of &#8220;hello world&#8221;) were run using each algorithm. The blue bar is the performance of the algorithm prior to adjustment of <code>setTimeout</code>s—that of Ubiquity 0.5.3. The gold bar is the time from a completely synchronous parse where all the <code>setTimeout</code>s were replaced. This algorithm completely locks up the UI, but is clearly the fastest, and should be seen as a baseline for all other yielding optimizations. The green bar is our new algorithm. As you can see, <strong>the parser is now roughly twice as fast.</strong></p>

<p>Moreover, the average time difference between <code>yield</code>s went from 0.7ms to 3.9ms which still should be no problem in terms of responsiveness.</p>

<h3>Cancellability</h3>

<p>This <code>doAsyncParse</code> is also the key for cancellability of the query. When a user changes the input or closes Ubiquity while a query is in progress, we want to cancel that query as soon as possible so the user and UI can advance. <code>keepworking</code> is set to false when the query is cancelled, so making sure that we <code>yield</code> early enough and often enough in the parse are important for issues like <a href="http://ubiquity.mozilla.com/trac/ticket/741">keystrokes being lost when typing too fast</a>.</p>

<p>While the parser was indeed <code>yield</code>ing often enough (in fact, more often than necessary) before, I noticed that our first <code>yield</code> was often 15-20&#160;ms into the parse. This was because step 1 of our parse derivation was happening outside of the <code>doAsyncParse</code> loop. By moving this into that loop, I was able to bring this initial synchronous time down to around 10&#160;ms. Of course, setting up the parse generator itself takes a little overhead, so this can never go down to 0, but perhaps this will improve <a href="http://ubiquity.mozilla.com/trac/ticket/741">the keystroke issue</a> as well. <strong>I&#8217;d love to get anecdotal feedback on whether this update improves the disappearing keystrokes issue from 0.5.4 users.</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>This is analogous to <a href="http://shawnwilsher.com/archives/279">a recent discussion of the asynchronous AwesomeBar</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>A note on methodology: the Parser 2 Playpen (<a href="chrome://ubiquity/content/playpen.html">chrome://ubiquity/content/playpen.html</a>) was used for all testing and timing. All tests were in Firefox 3.5 on Mac OS X Leopard. My machine is a 2.4&#160;GHz Intel Core 2 Duo MacBook Pro. No other (non-OS/daemon) apps were running. No other tabs were open and no other add-ons were installed.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/performance-vs-responsiveness/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Converting your Ubiquity command to Ubiquity 0.5</title>
		<link>http://mitcho.com/blog/projects/converting-your-ubiquity-command-to-ubiquity-0-5/</link>
		<comments>http://mitcho.com/blog/projects/converting-your-ubiquity-command-to-ubiquity-0-5/#comments</comments>
		<pubDate>Tue, 21 Jul 2009 12:15:37 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2521</guid>
		<description><![CDATA[Converting your Ubiquity command to Ubiquity 0.5 from mitcho on Vimeo. This video walks through the process of converting your Ubiquity commands to Ubiquity 0.5 with Parser 2. For more information, please consult the command conversion tutorial. Related posts: A Demonstration of Ubiquity Parser 2 Changes to Ubiquity Parser 2 and the Playpen Foxkeh demos [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5691107&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5691107&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/5691107">Converting your Ubiquity command to Ubiquity 0.5</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p>This video walks through the process of converting your Ubiquity commands to Ubiquity 0.5 with Parser 2. For more information, please consult <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2_API_Conversion_Tutorial">the command conversion tutorial</a>.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/converting-your-ubiquity-command-to-ubiquity-0-5/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ubiquity Localization: What&#8217;s New, What&#8217;s Next</title>
		<link>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 19:45:25 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[command]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountype]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[participate]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2432</guid>
		<description><![CDATA[Yesterday we released Ubiquity 0.5, a major update to the already popular Ubiquity platform. Among numerous other features, Ubiquity 0.5 includes the first fruit of months of research on building a multilingual parser and natural language interface. In this blog post I&#8217;ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Yesterday we <a href="https://labs.mozilla.com/2009/07/ubiquity-0-5/">released Ubiquity 0.5</a>, a major update to the already popular Ubiquity platform. Among <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.5_Release_Notes">numerous other features</a>, Ubiquity 0.5 includes the first fruit of <a href="http://mitcho.com/blog/tag/ubiquity/">months of research on building a multilingual parser and natural language interface</a>. In this blog post I&#8217;ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well as a quick roadmap of future considerations.</p>

<p>Of course, one of the best ways to learn about the new features is to experience them&#8230; try Ubiquity 0.5 now!</p>

<p><a href="https://ubiquity.mozilla.com/xpi/0.5/ubiquity-0.5.xpi" style="cursor:pointer;background: #01d835;border: 1px solid;border-color:#01d835 #4ece71 #4ece71 #01d835;-moz-border-radius:4px;padding:10px;text-transform:uppercase;font-size:1.3em;color:white;text-shadow:#1e792c 1px 1px 1px;">Install now!</a></p>

<p><span id="more-2432"></span></p>

<h3>Preface: What&#8217;s What</h3>

<p>To give users a completely localized experience, there are many different components that need to be made to work with different languages. In a single Ubiquity input, like</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">translate hello from English to Spanish</pre></td></tr></table></div>


<p>there are actually many different components that need to all be localized in order to comprehend the equivalent sentence in a different language. The diagram below will give you a sense for the different components that need to be localized: the parser, verbs, and nountypes.</p>

<table>
<tr><th>input:</th><td>translate</td><td>hello</td><td>from</td><td>English</td><td>to</td><td>Spanish</td></tr>
<tr><th>element type:</th><td>verb</td><td>free argument</td><td>delimiter</td><td>structured argument</td><td>delimiter</td><td>structured argument</td></tr>
<tr><th>component to localize:</th><td>verb name</td><td>&nbsp;</td><td>parser</td><td>nountype</td><td>parser</td><td>nountype</td></tr>
</table>

<h3>What&#8217;s New</h3>

<p>Ubiquity 0.5&#8217;s improved language support can be thought of as the product of two more or less orthogonal developments: the brand-new parser, Parser 2, as well as local command localization support.</p>

<h4>Parser 2</h4>

<p>Parser 2 (née <a href="http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/">Parser: The Next Generation</a>) is a completely new parser designed to support different languages easily. Taking a serious look at the similarities and differences between different languages, we created a universal <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">parser design</a> which takes a minimal set of settings for particular languages to &#8220;learn&#8221; that language&#8217;s grammar.</p>

<p>The key insight to Parser 2&#8217;s design is that, for the limited range of inputs Ubiquity should understand, languages deal with them in remarkably similar ways. The input we&#8217;re dealing with here are all commands or actions without quantification or negation. These are all comprised of a single verb and a series of arguments with certain markings to designate their roles in the sentence. For example, here&#8217;s our example Ubiquity input:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">translate hello from English to Spanish</pre></td></tr></table></div>


<p>In this example, &#8220;translate&#8221; is the verb, which we recognize by looking at our bank of known verbs, and the rest of the input can be split up into three different arguments: &#8220;hello,&#8221; &#8220;from English,&#8221; and &#8220;to Spanish.&#8221; Of these, the markers &#8220;from&#8221; and &#8220;to&#8221; tell us that &#8220;English&#8221; is a <em>source</em> of some sort and &#8220;Spanish&#8221; is a <em>goal</em>, while the unmarked &#8220;hello&#8221; is simply an <em>object</em>—the target of the action. By identifying arguments by these abstract <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles"><em>semantic roles</em></a>, we&#8217;re able to quickly identify different kinds of arguments in different languages. For example, the following is the exact same example but using the Japanese syntax and markers:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>2
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">helloをEnglishからSpanishにtranslate</pre></td></tr></table></div>


<p>Ubiquity knows what the different markers mean in Japanese, like &#8220;を&#8221; > <code>object</code>, &#8220;から&#8221; > <code>source</code>, &#8220;に&#8221; > <code>goal</code>, and can easily interpret this to mean the exact same command as (1). With just a few lines of code, <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">you can teach</a> Ubiquity how to recognize these different semantic roles in your language. This innovation also means that Ubiquity commands can be <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">written once for one language and automatically used with another language&#8217;s parser</a>, bringing us half-way to the goal of command localization.</p>

<p>Note also that Japanese (as in example (2)) is verb-final and uses no spaces between words. We&#8217;ve tried to make Parser 2 itself agnostic towards these types of different ways in which languages vary.</p>

<p>Parser 2 also adds <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">better argument-first suggestions</a>, inspired by some <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">earlier thoughts on Ubiquity in Japanese</a>. Ubiquity will now start to parse arguments in the input even if a verb isn&#8217;t found, and suggest verbs based on that input. For example, if you enter &#8220;hello to Spanish,&#8221; it&#8217;ll recognize that you have an <em>object</em> of &#8220;hello&#8221; and a <em>goal</em> of &#8220;Spanish&#8221; which can be understood as a language name, so it&#8217;ll suggest the verb &#8220;translate.&#8221; This is the way it should be. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>

<p><small>For more information and background, feel free to check out some of my previous blog posts <a href="http://mitcho.com/blog/tag/ubiquity+parser/">on the new parser</a> and <a href="http://mitcho.com/blog/tag/ubiquity+linguistics/">on the different linguistic considerations</a>. I also have a four-page academic paper giving an overview of some innovations in the parser—email me at <code>x@x.com</code> where <code>x=mitcho</code> if you&#8217;d like to get a copy.</small></p>

<h4>Internationalization of bundled commands</h4>

<p>The move to use <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles">semantic roles</a> in the <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_Source_Tip_Author_Tutorial">new command API</a>, described above, means that the same Ubiquity command code can be used with inputs in different languages. Two things are left, then, to make a completely localized input work: (1) translation (localization) of different strings in the commands and (2) localization of the nountypes.</p>

<p>In Ubiquity 0.5, we built a localization infrastructure for commands (1, above) but have not yet tackled the nountypes (2). Ubiquity 0.5 uses the <a href="http://en.wikipedia.org/wiki/gettext">gettext</a> <code>po</code> (portable object) file format for localizations, which many localizers in the UNIX world are very familiar with. This <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/79c7cea117ad04bb#">choice of file format</a> potentially opens Ubiquity localization up to many who are new to localization or are unfamiliar with other Mozilla localization. Ubiquity is able to produce localization templates by itself and we also have <a href="http://geeksbynature.dk/?p=35">a great tool</a> to check the completeness of different localizations.</p>

<p>A huge caveat, however, is that this localization support currently only works with the commands bundled with Ubiquity itself.</p>

<h3>What&#8217;s Next</h3>

<p>We&#8217;re going to continue working to make Ubiquity <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for more users. The tasks we have ahead of us are the localization of nountypes and community commands.</p>

<h4>Nountype localization</h4>

<p>With the new semantic role argument specifications, command localization simply became a question of translating some strings, which many localizers are used to. After all, we want localizations to affect the <em>presentation</em> of commands, not the logic of the commands. When it comes to nountypes, however, it is quite possible that we would actually want the nountype localization to affect its behavior.</p>

<p>Consider, for example, an imaginary <code>day_of_the_week</code> nountype. In English, this nountype might accept or suggest strings like &#8220;Monday&#8221; or &#8220;Tuesday,&#8221; while a French localization would accept &#8220;lundi&#8221; or &#8220;mardi.&#8221; More complicated still, consider a <code>date</code> nountype. In English this nountype may have custom logic to parse strings like &#8220;June 1st&#8221; while another language may have to parse very different kinds of strings. These nountype localizations thus involve not just string translations, but actual changes in their <em>logic</em>, making the <code>po</code> format approach we took to command localization a poor fit.</p>

<p>Making nountypes localizable, however, will make Ubiquity significantly more &#8220;natural&#8221; for many users. In the coming weeks and months we&#8217;ll be discussing and debating different options to accomplish this.</p>

<h4>Community command localization</h4>

<p>Even though the file format and infrastructure for command localization itself has been fleshed out with Ubiquity 0.5, the distributed nature of all these community commands adds an additional complication. Do we want community command localizations to be completely distributed, or should they be centralized? If they&#8217;re distributed, how do you find them? These are the types of questions we&#8217;ll need to ask and answer. The ease of creating a new Ubiquity command and sharing it with the world is a huge asset of the platform, so we&#8217;ll definitely be thinking about how best to localize these community commands as well. In the next day or two I&#8217;ll be writing up a more detailed blog post on what we need from a good community command localization solution.</p>

<h3>Summary</h3>

<p>For the more visually inclined (including myself), here&#8217;s a handy diagram to summarize what components are localizable now, what will be in the future, and what this means for Ubiquity users of different languages.</p>

<table>
<tr><th rowspan='2'>localized components</th><th rowspan='2'>Japanese input that Ubiquity will understand</th><th colspan='2'>support coverage</th></tr>
<tr><th>for bundled commands</th><th>for community commands</th></tr>
<tr><th><i>no localization</i></th><td>translate hello from English to Spanish</td><th rowspan='3' style='background: #99ff99'>Ubiquity 0.5!</th><th rowspan='2' style='background: #99ff99'>Ubiquity 0.5!</td></tr>
<tr><th>parser</th><td>helloをEnglishからSpanishにtranslate</td></tr>
<tr><th>parser + verbs</th><td>helloをEnglishからSpanishに訳す</td><th rowspan='2' style='background: #f99'><i>the future</i></th></tr>
<tr><th>parser + verbs + nountypes</th><td>helloを英語からスペイン語に訳す</td><th rowspan='1' style='background: #f99'><i>the future</i></th></tr>
</table>

<h3>Get Involved</h3>

<p>Whether you&#8217;re a speaker of an as-yet unsupported language, a veteran localization contributor, or simply interested in seeing how we can offer this natural language interface to more languages and more users, there are lots of ways to get involved. If you have some JavaScript experience and want to teach Ubiquity your native languages&#8217; grammar, read our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">parser localization tutorial</a>. If you would like to contribute localizations for our built-in commands, there&#8217;s a <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.5_Command_Localization_Tutorial">command localization tutorial</a>. To discuss how best to localize nountypes and community commands, or to ask questions about or discuss command and parser localization, join us on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity-i18n mailing list</a>.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Ubiquity 0.5 日本語紹介ビデオ</title>
		<link>http://mitcho.com/blog/projects/ubiquity-0-5-%e6%97%a5%e6%9c%ac%e8%aa%9e%e7%b4%b9%e4%bb%8b%e3%83%93%e3%83%87%e3%82%aa/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-0-5-%e6%97%a5%e6%9c%ac%e8%aa%9e%e7%b4%b9%e4%bb%8b%e3%83%93%e3%83%87%e3%82%aa/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 09:08:03 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[screencast]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2389</guid>
		<description><![CDATA[今夜リリースされる Ubiquity の最新版、0.5 に備えて日本語で Ubiquity のスクリーンキャストを作ってみました。 Ubiquity 0.5 は特に多言語化を重視したリリースで、 Ubiquity 内蔵のコマンドが日本語とデンマーク語で使えるようになっています。是非インストールしてみてください！ 追伸： ７月３日現在、 Ubiquity 0.5 のリリースを遅らせる方向になったので、残念ながら今日はリリースされません。是非リリース後インストールしてみてください。 Ubiquity 0.5 日本語紹介ビデオ from mitcho on Vimeo. As Ubiquity 0.5 will be released soon (Thursday morning in Mountain View), I decided it was a good time to put together a screencast in Japanese demoing the use of the new [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>今夜リリースされる <a href="http://ubiquity.mozilla.com">Ubiquity</a> の最新版、0.5 に備えて日本語で Ubiquity のスクリーンキャストを作ってみました。 Ubiquity 0.5 は特に多言語化を重視したリリースで、 Ubiquity 内蔵のコマンドが日本語とデンマーク語で使えるようになっています。是非<a href="http://ubiquity.mozilla.com">インストール</a>してみてください！</p>

<p><b>追伸：</b> ７月３日現在、 Ubiquity 0.5 の<a href="http://groups.google.com/group/ubiquity-firefox/browse_thread/thread/9073295d0281f768">リリースを遅らせる方向</a>になったので、残念ながら今日はリリースされません。是非リリース後インストールしてみてください。</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5420966&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5420966&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/5420966">Ubiquity 0.5 日本語紹介ビデオ</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p>As Ubiquity 0.5 will be released soon (Thursday morning in Mountain View), I decided it was a good time to put together a screencast in Japanese demoing the use of the new Japanese parser and commands.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-0-5-%e6%97%a5%e6%9c%ac%e8%aa%9e%e7%b4%b9%e4%bb%8b%e3%83%93%e3%83%87%e3%82%aa/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Ubiquity presentation at Tokyo 2.0</title>
		<link>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 09:54:13 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[bilingual]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[GoaP]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[screencast]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2203</guid>
		<description><![CDATA[This past Monday I presented at Tokyo 2.0, Japan&#8217;s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from goo Labs, Shinjyou Sunao of Knowledge Creation, developers of the Voice Delivery System API, and Chris Salzberg of Global [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/' rel='bookmark' title='Notes from BarCamp Tokyo 2009'>Notes from BarCamp Tokyo 2009</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/06/t2p01.png" alt="T2P0.PNG" border="0" width="211" height="120" /></p>

<p>This past Monday I presented at <a href="http://www.tokyo2point0.net/events/tokyo-20-25-the-web-language">Tokyo 2.0</a>, Japan&#8217;s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from <a href="http://labs.goo.ne.jp/intl/">goo Labs</a>, Shinjyou Sunao of <a href="http://www.knowlec.com/">Knowledge Creation</a>, developers of the <a href="http://www.vdsapi.ne.jp/">Voice Delivery System</a> API, and <a href="http://globalvoicesonline.org/author/chris-salzberg/">Chris Salzberg</a> of <a href="http://globalvoicesonline.org/">Global Voices Online</a> on community translation.</p>

<p>I just put together a video of my Ubiquity presentation, mixing <a href="http://www.ustream.tv/recorded/1625213">the audio recorded live</a> at the presentation together with a screencast of my slides for better visibility. The presentation is 10 minutes long and is bilingual, English and Japanese.</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5091071&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5091071&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/5091071">Ubiquity: Command the Web with Language 言葉で操作する Web</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p><span id="more-2203"></span>
The event also coincided with <a href="http://www.linkedin.com/in/davemcclure">Dave McClure&#8217;s</a> <a href="http://www.geeksonaplane.com/">Geeks on a Plane</a> Asia tour, attracting even more interest to the event. In the end it was the largest Tokyo 2.0 event ever.</p>

<p>As I <a href="http://twitter.com/mitchoyoshitaka/status/1980687478">leave Tokyo next month</a>, I&#8217;ll be sad to not be able to continue to be a part of Tokyo 2.0. I&#8217;ve met a lot of fascinating people and learned a lot at the monthly events. I&#8217;ll definitely make sure to schedule them in in my future travels back to Japan and I highly recommend any of you who travel to Tokyo do so as well.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/' rel='bookmark' title='Notes from BarCamp Tokyo 2009'>Notes from BarCamp Tokyo 2009</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Changes to Ubiquity Parser 2 and the Playpen</title>
		<link>http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/</link>
		<comments>http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 08:21:47 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[screencast]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2173</guid>
		<description><![CDATA[Here&#8217;s a quick screencast highlighting some of the changes to Parser 2 and the updated Parser 2 Playpen. This video should be particularly useful to people hoping to add their language to Parser 2. It&#8217;s also a good reference for Ubiquity core developers. Changes to Ubiquity Parser 2 + Playpen from mitcho on Vimeo. All [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quick screencast highlighting some of the changes to Parser 2 and the updated <a href="chrome://parser-demo/content/index.html">Parser 2 Playpen</a>. This video should be particularly useful to people hoping to <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">add their language to Parser 2</a>. It&#8217;s also a good reference for Ubiquity core developers.</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5013787&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5013787&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/5013787">Changes to Ubiquity Parser 2 + Playpen</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p>All the features covered, as with all Parser 2 features, require that you <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">get the latest Ubiquity code</a> from our Mercurial repository.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Big Issues and Small Issues with Parser 2</title>
		<link>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/</link>
		<comments>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/#comments</comments>
		<pubDate>Wed, 20 May 2009 03:05:53 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Dutch]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Greek]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2087</guid>
		<description><![CDATA[Jono and I had a good conversation this morning on IRC about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for Ubiquity. Here are our Top 4 Big Issues: Some commands&#8217; preview&#8217;s and execute&#8217;s are not working properly (trac #652). This could be an underlying issue with [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><a href="http://jonoscript.wordpress.com/">Jono</a> and I had a good conversation this morning on <a href="irc://irc.mozilla.org/ubiquity">IRC</a> about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for <a href="http://ubiquity.mozilla.com">Ubiquity</a>. Here are our <strong>Top 4 Big Issues</strong>:</p>

<ol>
<li><strong>Some commands&#8217; <code>preview</code>&#8217;s and <code>execute</code>&#8217;s are not working properly</strong> (<a href="http://ubiquity.mozilla.com/trac/ticket/653">trac #652</a>). This could be an underlying issue with some pipes not rerouted correctly in Parser 2, or it could be that the commands have not been rewritten correctly to take advantage of Parser 2.</li>
<li><strong>Flesh out how to localize resources, like commands and nountypes.</strong> We <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">started a conversation</a> on this subject a few weeks ago but we never reached a resolution. This blocks issues 3 and 4 below.</li>
<li><strong>We need to standardize a format for commands for Parser 2.</strong> As <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-05-13_Weekly_Meeting#Notes">noted in last week&#8217;s meeting</a> (among other places) Parser 2 will require at least some modification to all commands. Jono and I came up with a simple hybrid format for commands which specify <code>takes</code> and <code>modifiers</code> for Parser 1 and <code>arguments</code> for Parser 2, but until we figure out how exactly the localization of commands will work, we can&#8217;t write a definitive standard.</li>
<li><strong>Enable nountype localization.</strong> While the most popular nountypes used are those that ship with Ubiquity, it is important to come up with a localization process which can apply to custom nountypes as well. Nountype localizations need the ability to either (1) replace the <code>_name</code> only, or (2) replace both the <code>_name</code> and the <code>suggest()</code> logic, as both cases will be necessary.</li>
</ol>

<p>Given that Big Issue 3 and Big Issue 4 are both dependent on Big Issue 2, there clearly needs to be a continued public discussion of how we should make these resources localizable. <strong>I look forward to this discussion taking place <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-05-20_Weekly_Meeting">at tomorrow&#8217;s joint (general + i18n) Ubiquity meeting</a>.</strong></p>

<p>In other news, here are some <strong><small>Small Issues</small></strong>:</p>

<ol>
<li><strong>Add a switch for parser version and language settings</strong>: Jono&#8217;s already made a space for this in the new &#8220;Settings and Skins&#8221; page in <code>about:ubiquity</code>. He&#8217;s on it. Like a bonnet.</li>
<li><strong>Magic word (anaphor) substitution is not yet working properly.</strong> This needs to work both when there is an explicit magic word and when there are simply missing arguments.</li>
<li><strong>The position of suggested verbs is always sentence-initial</strong> (<a href="http://ubiquity.mozilla.com/trac/ticket/655">trac #655</a>). This also requires that we can specify whether verb name localizations are sentence-initial forms or sentence-final forms.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></li>
<li><a href="http://ubiquity.mozilla.com/trac/search?ticket=on&amp;q=new-parser">&#8230;</a></li>
</ol>

<p>Let&#8217;s hit the code!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>German, Dutch, and Greek, for example, are all languages where there are both command verb forms which are sentence-initial and sentence-final.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Notes from BarCamp Tokyo 2009</title>
		<link>http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/</link>
		<comments>http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/#comments</comments>
		<pubDate>Mon, 18 May 2009 05:40:54 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[BarCamp]]></category>
		<category><![CDATA[cereling]]></category>
		<category><![CDATA[discussion]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[open ideas]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[photos]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[slides]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2079</guid>
		<description><![CDATA[This past Saturday was Tokyo BarCamp 2009 at Sun&#8217;s Yoga offices. I of course gave a presentation on Ubiquity and our recent localization efforts, including Parser 2. As you can see, I signed up quickly: CC-BY-NC iMorpheus Here are the slides I used in that session. There are two &#8220;demo&#8221; sections in the slides&#8230; the [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>This past Saturday was <a href="http://barcamp.org/BarCamp-Tokyo2009">Tokyo BarCamp 2009</a> at <a href="http://blogs.sun.com/jimgris/page/yoga">Sun&#8217;s Yoga offices</a>. I of course gave a presentation on Ubiquity and our recent <a href="https://wiki.mozilla.org/Labs/Ubiquity/i18n">localization efforts</a>, including <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Parser 2</a>. As you can see, I signed up quickly:</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/ubiquity-wall-650.jpg" alt="ubiquity-wall-650.jpg" border="0" width="650" height="504" /><br/><small>CC-BY-NC <a href='http://www.flickr.com/photos/sfj/3535231830/'>iMorpheus</a></small></p>

<p>Here are the slides I used in that session. There are two &#8220;demo&#8221; sections in the slides&#8230; the first was a simple demo of Ubiquity 0.1.x showing off the <code>translate</code>, <code>map</code>, and <code>edit-page</code> commands. The second demo was of <a href="http://vimeo.com/4307110">Ubiquity Parser 2</a> and showing off how little code it takes to <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">add your language to Ubiquity with Parser 2</a>.</p>

<p><object style="margin:0px" width="649" height="542"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=barcampubiquity-090515212518-phpapp01&#038;rel=0&#038;stripped_title=ubiquity-command-the-web-with-language" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=barcampubiquity-090515212518-phpapp01&#038;rel=0&#038;stripped_title=ubiquity-command-the-web-with-language" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="649" height="542"></embed></object></p>

<p><span id="more-2079"></span></p>

<p>I also led a brief afternoon discussion on open ideas: the approach of working in the open, making the thought process and decisions public, not just the results. Here are the slides for that one:</p>

<p><object style="margin:0px" width="649" height="542"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=barcampopenideas-090516002650-phpapp01&#038;rel=0&#038;stripped_title=open-ideas-a-conversation" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=barcampopenideas-090516002650-phpapp01&#038;rel=0&#038;stripped_title=open-ideas-a-conversation" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="649" height="542"></embed></object></p>

<p>My impetus for doing this discussion is probably clear to many—as I go back to <a href="http://web.mit.edu/linguistics/">academia</a> this fall, I&#8217;ve been recently wondering about whether I could apply the open process used at Mozilla to my own academic work, blogging frequently about little thoughts and discoveries, even if they turn out to be wrong or dead-ends. For those who are interested, Brian Lockwood caught <a href="http://vimeo.com/4686609">some of my slides and the discussion on video</a>.</p>

<p>I also found a great photo from the discussion—I&#8217;m right there on the left edge.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/open-ideas-discussion.jpg" alt="open-ideas-discussion.jpg" border="0" width="650" height="283" /><br/><small>CC-BY-SA <a href='http://www.flickr.com/photos/lostininaka/3535388674/'>LostInInaka</a></small></p>

<p>Sun&#8217;s facilities had a combination of different styles of rooms (lecture/presentation, this large seminar-style, a small-group one) which was very nice for different styles of sessions.</p>

<p>I really enjoyed meeting the Yokohama International School IT department folks who had an interesting presentation (<a href="http://vimeo.com/4683315">video</a>) on how they live stream their department meetings every week and try to be open and benefit from (and give back to) an ed-tech community larger than themselves. Other highlights for me included <a href="http://en.wikipedia.org/wiki/Mitch Altman">Mitch Altman</a>&#8217;s session on <a href="http://en.wikipedia.org/wiki/hacker spaces">hacker spaces</a> and &#8220;hacking&#8221; of all varieties and sorts, Karamoon&#8217;s very knowledgeable longer session on security (<a href="http://vimeo.com/4683437">video</a>).</p>

<p>Finally it&#8217;s good to point out that my friend <a href="http://twitter.com/kimtaro">Kim Ahlström</a> open-sourced his website <a href="http://jisho.org">jisho.org</a>. You can now get that entire repository <a href="http://github.com/Kimtaro/jisho.org/tree/master">on github</a>. He also demoed the Cereling library/wrapper for various parsers and morphological tools which <a href="http://smart.fm/">his company</a> will be open-sourcing in the future. Happy hacking!</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Italian!</title>
		<link>http://mitcho.com/blog/projects/ubiquity-in-italian-2/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-in-italian-2/#comments</comments>
		<pubDate>Mon, 18 May 2009 04:32:25 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2075</guid>
		<description><![CDATA[Thanks to the great work of Sandro Della Giustina, we now have a preliminary Italian parser for use with Ubiquity Parser 2. Sandro brought up a good point, however, about Italian prepositions which contract with the article and the head noun. For example, traduci dall'inglese al cinese translate from=the=English to=the Chinese One current solution is [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-italian/' rel='bookmark' title='Ubiquity in Italian'>Ubiquity in Italian</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Thanks to the great work of Sandro Della Giustina, we now have a preliminary Italian parser for use with Ubiquity Parser 2. Sandro brought up a good point, however, about Italian prepositions which contract with the article <em>and</em> the head noun. For example,</p>


<div class="wp_syntax"><div class="code"><pre class="it" style="font-family:monospace;">traduci   dall'inglese     al     cinese
translate from=the=English to=the Chinese</pre></div></div>


<p>One current solution is to add <a href="http://en.wikipedia.org/wiki/zero-width space">zero-width spaces</a> after these contracted articles, <em>all&#8217;</em> and <em>dall&#8217;</em>.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> The appropriate way to add this to the parser is by defining a custom <code>wordBreaker()</code> method.</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">it._patternCache.<span style="color: #660066;">contractionMatcher</span> <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> RegExp<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'(^| )(all<span style="color: #000099; font-weight: bold;">\'</span>|dall<span style="color: #000099; font-weight: bold;">\'</span>)'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'g'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
it.<span style="color: #660066;">wordBreaker</span> <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>input<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #000066; font-weight: bold;">return</span> input.<span style="color: #660066;">replace</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span>._patternCache.<span style="color: #660066;">contractionMatcher</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'$1$2<span style="color: #000099; font-weight: bold;">\u</span>200b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></div></div>


<p>Grazie Sandro!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>As <a href="http://blog.mozilla.com/nattokirai/">John Daggett</a> pointed out to me, in the future we may have to add an intermediate shallow parse instead of adding characters (in this case, the zero-width space) to the modified input.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-italian/' rel='bookmark' title='Ubiquity in Italian'>Ubiquity in Italian</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-in-italian-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Inside the Argument</title>
		<link>http://mitcho.com/blog/projects/inside-the-argument/</link>
		<comments>http://mitcho.com/blog/projects/inside-the-argument/#comments</comments>
		<pubDate>Wed, 13 May 2009 02:13:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2046</guid>
		<description><![CDATA[Here&#8217;s a little picture of the different sections of text in a single parsed argument and which properties of the resultant argument object they are assigned to. You&#8217;ll see, from left to right, outerSpace, modifier, innerSpace, inactivePrefix, input/data, inactiveSuffix. The example text is from the Catalan example, &#8220;compra mitjons amb el Google,&#8221; meaning &#8220;buy socks [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/projects/automating-the-linguists-job/' rel='bookmark' title='Automating the Linguist&#8217;s Job'>Automating the Linguist&#8217;s Job</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a little picture of the different sections of text in a single parsed argument and which properties of the resultant argument object they are assigned to.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/insidetheargument.jpg" alt="insidetheargument.jpg" border="0" width="650" height="350" /></p>

<p>You&#8217;ll see, from left to right, <code>outerSpace</code>, <code>modifier</code>, <code>innerSpace</code>, <code>inactivePrefix</code>, <code>input</code>/<code>data</code>, <code>inactiveSuffix</code>.</p>

<p>The example text is from the Catalan example, &#8220;compra mitjons amb el Google,&#8221; meaning &#8220;buy socks with Google.&#8221; You&#8217;ll notice the argument &#8220;amb el Google&#8221; is literally &#8220;with the Google.&#8221; The <code>normalizeArgument()</code> method of the Catalan parser, as <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">I described earlier this week</a>, strips the article &#8220;el &#8221; and puts it in the <code>inactivePrefix</code> property of the argument.</p>

<p>I&#8217;m going to spend the rest of the day updating <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Parser 2 design doc</a> and related documentation so they match these and other recent developments in the parser.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/projects/automating-the-linguists-job/' rel='bookmark' title='Automating the Linguist&#8217;s Job'>Automating the Linguist&#8217;s Job</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/inside-the-argument/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solving Another Romantic Problem: Weak Pronouns</title>
		<link>http://mitcho.com/blog/projects/solving-another-romantic-problem/</link>
		<comments>http://mitcho.com/blog/projects/solving-another-romantic-problem/#comments</comments>
		<pubDate>Tue, 12 May 2009 08:09:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[Modern Greek]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Portuguese]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[Spanish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2035</guid>
		<description><![CDATA[Yesterday I blogged on how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem. The problem: Weak pronouns in romance languages (as well as some other languages) have a special property where they cliticize to the verb, moving from [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Yesterday I blogged on <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/">how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2</a>, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem.</em></p>

<h3>The problem:</h3>

<p>Weak pronouns in <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a> (as well as some other languages) have a special property where they <em>cliticize</em> to the verb, moving from its regular argument position to a position next to the verb. For example, in French, we have an imperative like (1) with gloss as (2):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez  le  lettre à  Pierre!
send.IMP the letter to Pierre</pre></td></tr></table></div>


<p>If we replace <em>le lettre</em> or <em>à Pierre</em> with a preposition (<em>le</em>, &#8220;it&#8221;, or <em>lui</em>, &#8220;to him&#8221;, respectively), those weak pronouns move next to the verb—in particular, (5) exemplifies the change in word order. Replacing both arguments with prepositions creates the stacked clitic form of (7).<sup id="fnref:3"><a href="#fn:3" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
5
6
7
8
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez-la à  Pierre!
send   -it to Pierre
Envoyez-lui la  lettre!
send   -him the letter
Envoyez-le-lui!
send   -it-him</pre></td></tr></table></div>


<p>The fact that these weak pronouns are attached to the verb and lack separate delimiters mean that we will need a separate mechanism to parse these arguments: indeed, this functionality has been planned in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2</a> as &#8220;step 3&#8221;. Here I&#8217;ll examine some data and discuss a strategy for the parsing of weak pronouns.</p>

<p><span id="more-2035"></span></p>

<h3>Weak pronouns in Ubiquity</h3>

<p>In Ubiquity the only pronoun we currently deal with is the deictic <code>object</code>-role anaphor, like &#8220;it,&#8221; &#8220;this,&#8221; etc. in English.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> In addition, as these weak pronoun clitics cannot by definition be embedded within a larger noun phrase, its referent would constitute the entire <code>object</code> argument. As such, it is most logical to place clitic handling before argument structure parsing and simply hand the argument parser the argument string without the clitic.</p>

<h3>Marking the clitic</h3>

<p>We can classify languages with cliticized weak pronouns into two cases based on their processing considerations: languages that overtly mark the clitic and those which do not.</p>

<h4>Languages which delimit the clitic</h4>

<p>Some languages such as French (see above) clearly mark the boundary between the verb and the clitic. It will be relatively easy to parse weak pronouns in such languages as we can simply <a href="http://ubiquity.mozilla.com/trac/ticket/665">insert a no-width space</a> between the verb and the clitic. A list of clitics can then be designated in the parser (much like anaphora are now) and these weak pronouns can be interpreted as the selection (or &#8220;this&#8221;-referent).<sup id="fnref:2"><a href="#fn:2" rel="footnote">3</a></sup></p>

<p><strong>Portuguese:</strong> (from <a href="http://email.eva.mpg.de/~cysouw/pdf/cysouwDGFS.pdf">Cysouw 2003</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="pr" style="font-family:monospace;">Come-o
eat -it</pre></div></div>


<p><strong>Catalan:</strong> (from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Cerca-ho
search-it</pre></div></div>


<p><strong>Modern Greek:</strong> (from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>; I know, I know, Greek&#8217;s not a romance language, but it has weak pronoun clitics too&#8230; it&#8217;s all good.)</p>

<p>Modern Greek actually inserts a space between the verb and weak pronouns.</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Diavase to
read   -it</pre></div></div>


<h4>Languages which do not delimit the clitic</h4>

<p>Some languages do not insert any delimiter between the verb and the weak pronoun, essentially entering them as a single word (in the string sense, at least). These cases may be more difficult to parse, especially as there may be sound changes to the verb stem itself.</p>

<p><strong>Italian:</strong> (first example from <a href="http://books.google.com/books?id=tnXJVbGpMfEC">Kayne 1994</a>)</p>

<p>Italian is a case where some verbs actually conjoin with the verb in imperatives, much like their prepositions which I noted yesterday have an elaborate system of portmanteau&#8217;ed forms.</p>


<div class="wp_syntax"><div class="code"><pre class="it" style="font-family:monospace;">Fallo
do-it
Mangialo
eat  -it</pre></div></div>


<p><strong>Spanish:</strong> (first example from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>, second from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>

<p>Spanish is the same way:</p>


<div class="wp_syntax"><div class="code"><pre class="es" style="font-family:monospace;">Léelo
read-it
Búscalo
search-it</pre></div></div>


<h3>Displaying the suggestion</h3>

<p>The current Ubiquity handling of anaphora (aka &#8220;magic words&#8221;) involves a display of the selection (replacement) text in a stylized way. One problem with clitics may be how to visually present this replacement to the user.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-11.png" alt="Picture 1.png" border="0" width="284" height="160" /></center></p>

<p>For languages with a delimiter such as French we could simply present the selection as an object right after the verb, without the hyphen.</p>

<table>
<tr><th>input:</th><td>traduisez-le (translate-it)</td></tr>
<tr><th>suggestion:</th><td>traduisez <span style='  padding: 2px;
  -moz-border-radius: 3px;
  display: inline-block;
  font-variant: small-caps;
  background-color: #BBB;
  color: #333;
  position: relative;
  top: -2px;
  font-size: 8pt;
  font-weight: normal;
  border: 1px solid #777;'>selection</span></td></tr>
</table>

<p>Things may be more complicated, however, in languages where the clitic is not delimited from the verb, or where the verb form itself has changed due to the attachment of the clitic.</p>

<h3>Conclusion</h3>

<p>In this blog post I&#8217;ve tried to lay out some of the weak pronoun phenomena relevant to Ubiquity with some ideas on how to implement its parsing. I believe parsing weak pronouns should be relatively straightforward in languages with delimiters—for those which do not have delimiters, some creativity may be required in how building regular expressions or rules to detect the clitics and in presenting these suggestions to the user.</p>

<p><strong>Does your language have weak pronoun clitics? What do you think will be the challenges in trying to parse these arguments?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:3">
<p>Note that the reverse order of &#8220;Envoyez-lui-le&#8221; is ungrammatical&#8230; fortunately we most likely will not have to deal with multiple clitics&#8230; see footnote two below.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This is not so much an informed decision that we should not do different kinds of anaphors but simply that we haven&#8217;t gotten around to implementing it. I personally am not sure, however, whether there is a real need for parsing for anaphors for roles other than <code>object</code> (for example, French <em>lui</em> as seen above which would be a <code>goal</code> anaphor).&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>There is, however, a question of whether weak pronoun replacement should be obligatory or not: that is, if we see a regular anaphor right now such as &#8220;this,&#8221; we make two copies of the parse: one with the replacement, one without. In the case where we detect an anaphor, should the replacement be obligatory? I believe it should be, though, as with many other Parser 2 features, I believe we can continue to parse other options with no replacement but let the scoring system kill those parses off. If a verb has a clitic attached to it but we do not remove it, it most likely will do very poorly in scoring anyway.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-another-romantic-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</title>
		<link>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/</link>
		<comments>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/#comments</comments>
		<pubDate>Mon, 11 May 2009 05:19:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[portmanteau]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2019</guid>
		<description><![CDATA[The problem: In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>The problem:</h3>

<p>In many <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a>, prepositions and articles often form <a href="http://en.wikipedia.org/wiki/portmanteau">portmanteau morphs</a>, combining to form a single word.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau&#8217;ed prepositions and articles&#8230; I refer you to the <a href="http://en.wikipedia.org/wiki/Contraction (grammar)#Italian">contraction</a> article on Wikipedia.</p>

<p>As I <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">noted a couple weeks ago</a>, however, some combinations do not form portmanteaus.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>

<p><span id="more-2019"></span>
<strong>French:</strong></p>

<ol>
<li>à + le > au</li>
<li>à + la > à la</li>
</ol>

<p>The problem with this is that if we use both <em>à</em> and <em>au</em> as delimiters, we may end up passing the definite article to the verb as part of the argument in some cases, but not in other cases.</p>

<ol>
<li>&#8220;<strong>à</strong> la table&#8221; = &#8220;<strong>to</strong> the table&#8221;</li>
<li>&#8220;<strong>au</strong> chat&#8221; = &#8220;<strong>to the</strong> cat&#8221;</li>
</ol>

<h3>The solution:</h3>

<p>The solution is a new step in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">the Parser 2 process</a> which normalizes the form of arguments. Each language&#8217;s parser can now optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: '', newInput: '', suffix: ''}</code>. For example, if you feed &#8220;la table&#8221; to the French <code>normalizeArgument()</code>, it ought to return</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#123;</span>prefix<span style="color: #339933;">:</span> <span style="color: #3366CC;">'la '</span><span style="color: #339933;">,</span> newInput<span style="color: #339933;">:</span> <span style="color: #3366CC;">'table'</span><span style="color: #339933;">,</span> suffix<span style="color: #339933;">:</span> <span style="color: #3366CC;">''</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#93;</span></pre></div></div>


<p>If there are no possible normalizations, <code>normalizeArgument()</code> should simply return <code>[]</code>. Each alternative returned by <code>normalizeArgument()</code> is substituted into a copy of the possible parses just before nountype detection. The prefixes and suffixes are stored in the argument (as <code>inactivePrefix</code> and <code>inactiveSuffix</code>) so they can be incorporated into the suggestion display.</p>

<p>Here, for example, is how the inactive prefix &#8220;l&#8217;&#8221; is displayed in <a href="chrome://parser-demo/content/index.html">the parser demo</a>. This way the user is told that the &#8220;l&#8217;&#8221; prefix is being ignored, and the nountype detection and verb action can act on the argument &#8220;English&#8221;.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> (In the future, of course, we could teach this nountype to accept the Catalan &#8220;anglès&#8221;.)</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-1.png" alt="Picture 1.png" border="0" width="320" height="29" /></center></p>

<p>The easiest way to produce this output is to use the <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/String/match"><code>String.match()</code></a> method. For example <code>normalizeArgument()</code> code, I refer you to the <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/ca.js">Catalan</a> and <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/fr.js">French</a> parser files.</p>

<p>I hope that this solution will help make Ubiquity with Parser 2 feel <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for many romance languages.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>Thanks to <a href="http://people.ucsc.edu/~jpobrien/">Jeremy O&#8217;Brien</a> for helping me figure out how to refer to this phenomenon.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This also relates to the issue of <a href="http://ubiquity.mozilla.com/trac/ticket/671">parsing multi-word delimiters</a>, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Thank you to contributor <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a> for our first attempt at a Catalan parser!&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>In Case of Case&#8230;</title>
		<link>http://mitcho.com/blog/projects/in-case-of-case/</link>
		<comments>http://mitcho.com/blog/projects/in-case-of-case/#comments</comments>
		<pubDate>Wed, 06 May 2009 09:54:53 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Arabic]]></category>
		<category><![CDATA[Basque]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Latin]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Polish]]></category>
		<category><![CDATA[Turkish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1994</guid>
		<description><![CDATA[A recently hot topic of discussion in the Ubiquity i18n realm has been how to deal with strongly case-marking languages. As we continue to make steady progress, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2. Introduction Grammatical case is a marking on nouns [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>A recently hot topic of discussion in the <a href="https://wiki.mozilla.org/Labs/Ubiquity/i18n">Ubiquity i18n</a> realm has been <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">how to deal with strongly case-marking languages</a>. As we continue to make <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/log?rev=new-parser">steady progress</a>, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2.</p>

<h3>Introduction</h3>

<p><a href="http://en.wikipedia.org/wiki/Grammatical case">Grammatical case</a> is a marking on nouns that express grammatical function. Not all languages exhibit case. In many of the Indo-European languages we hope to bring Ubiquity to, case is realized as a suffix.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Here&#8217;s a classic example of case from <a href="http://en.wikipedia.org/wiki/Latin">Latin</a>. (Line 2 is the gloss of 1, line 4 of 3.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="la" style="font-family:monospace;">canis      virum      momordit
dog=sg.NOM man=sg.ACC bite=3sg.perfect
vir        canem      momordit
man=sg.NOM dog=sg.ACC bite=3sg.perfect</pre></td></tr></table></div>


<p>Example (1) is &#8220;the man bit the dog,&#8221; while example (3) is &#8220;the dog bit the man.&#8221; The only difference, as you see in the gloss, is that the nouns <em>canis</em> and <em>vir</em> are marked with different case endings in the two sentences. By marking the nouns with different cases (here, <a href="http://en.wikipedia.org/wiki/nominative">nominative</a> and <a href="http://en.wikipedia.org/wiki/accusative">accusative</a>), their semantic roles in the sentence—which is the the biter and which is the bitee—can be identified unambiguously. (Their positions are also switched in these examples but in reality Latin has a very free word order—the same sentences with other word orders including OSV or VSO are also common.)</p>

<p>At first glance, strongly case-marked languages may look like a godsend for <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">identifying the semantic roles of arguments</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> If we can easily and unambiguously recognize arguments&#8217; cases to put them in their appropriate semantic roles, this could simplify processing as well as make Ubiquity input follow a <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">natural syntax</a> for such languages. Unfortunately, there are some significant challenges which must be overcome in order to make the processing of case-markers worthwhile.</p>

<p><span id="more-1994"></span></p>

<h3>The case against case</h3>

<p>There are broadly three different difficulties with dealing with strongly case-marking languages: (1) how to identify case correctly, (2) how to identify the boundaries of the arguments, and (3) what case to use when handing the arguments to the verb&#8217;s preview and execution.</p>

<h4>Parsing for case</h4>

<p>In some languages, it is very easy to recognize different case endings. For example, for Turkish it would be relatively easy to write a regular expression for each of the cases below, even with the <a href="http://en.wikipedia.org/wiki/vowel harmony">vowel harmony</a> as exhibited in the genitive and accusative cases between <em>i</em> and <em>ü</em>.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>

<table>
<tr>
<th>Case</th>
<th>Ending</th>
<th><i>köy</i> &#8220;village&#8221;</th>
<th>Meaning</th>
</tr>
<tr>
<td>Nominative</td>
<td>Ø (none)</td>
<td><i>köy</i></td>
<td>village</td>
</tr>
<tr>
<td>Genitive</td>
<td><i>-in</i></td>
<td><i>köyün</i></td>
<td>the village&#8217;s<br />
of the village</td>
</tr>
<tr>
<td>Dative</td>
<td><i>-e</i></td>
<td><i>köye</i></td>
<td>to the village</td>
</tr>
<tr>
<td>Accusative</td>
<td><i>-i</i></td>
<td><i>köyü</i></td>
<td>the village</td>
</tr>
<tr>
<td>Ablative</td>
<td><i>-den</i></td>
<td><i>köyden</i></td>
<td>from the village</td>
</tr>
<tr>
<td>Locative</td>
<td><i>-de</i></td>
<td><i>köyde</i></td>
<td>in the village</td>
</tr>
</table>

<p>(Example from <a href="http://en.wikipedia.org/wiki/Turkish language">Turkish language</a> on Wikipedia.)</p>

<p>However, in many other languages identifying case affixes can be quite difficult as they vary greatly depending on the root noun, not to mention irregular declensions. For example, in Polish the nominative <em>student</em> becomes the <em>studenta</em> in the accusative which may look like a simple suffix, but the nominative <em>pies</em> (&#8220;dog&#8221;) becomes <em>psa</em> while <em>stół</em> (&#8220;table&#8221;) remains unchanged.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> Writing rules for these differing (and sometimes not unambiguous) case-marking paradigms without building in lexical information would be very difficult indeed.</p>

<h4>Finding the edges</h4>

<p>Recall that the current <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2 design</a> identifies arguments by identifying known delimiters (most often some adposition) as a left or right edge of an argument. By not having to run the nountype detection over every substring of the input, we greatly reduce the processing time needed in each parse. This approach, however, relies on our being able to reliably identify some sort of boundary for each of our arguments.</p>

<p>In strongly case-marking languages, the case is realized on the noun itself, but this noun may be buried in the middle of the noun phrase. Even if we could reliably identify the case-marker, it would mark neither the left nor right edge of the argument, making our current parsing strategy worthless. For example, consider the following Arabic example of &#8220;the house of the man&#8221; in nominative and accusative cases:<sup id="fnref:6"><a href="#fn:6" rel="footnote">5</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="ar" style="font-family:monospace;">baytu 'r-rajuli
house=NOM of=man
bayta 'r-rajuli
house=ACC of=man</pre></td></tr></table></div>


<p>In these cases, we see that the only distinction between (5) (بَيتُ الرَّجُلِِ) and (7) (بَيتِ الرَّجُلِِ) is the case suffix on the head noun, &#8220;house,&#8221; which sits in the middle of the noun phrase. Even if we could properly identify this case ending, it would mark neither the left nor the right edge of the entire argument.</p>

<p>Contrast this with German where, even though arguments have case, the case is realized on the article, not on the noun head itself, so we can essentially deal with these articles as prepositions, using the article as the left edge of the argument.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="de" style="font-family:monospace;">den     großen Hund
the=ACC big    dog
dem     großen Hund
the=DAT big    dog</pre></td></tr></table></div>


<p>Believe it or not, things can actually get even worse than just not being able to find an edge of our arguments. The worst-case scenario comes from discontinuous constituents, in languages where case marking on both nouns and modifiers allow for very free word order. Latin is just such a language:<sup id="fnref:5"><a href="#fn:5" rel="footnote">6</a></sup></p>

<p>From M. Tullius Cicero, &#8220;Against Catiline,&#8221; chapter 1:<br/>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">quem        ad finem      sese effrenata                       iactabit         audacia?
what=sg.ACC to extent=ACC self unbridle=perf-past-part.3sg.NOM fling=future.3sg audacity=sg.NOM</pre></div></div>


<br/>
&#8220;To what extent will (your) unbridled audacity fling itself about?&#8221;
</p>

<p>In this example we see that <em>effrenata</em> is modifying <em>audacia</em> but the two do not form a unit in the linear order but their relationship can be recovered because both words carry the nominative case marking. While it would be unfair to expect Ubiquity to ever be able to properly parse such arguments, requiring a certain amount of discipline from the user, this is an illustration of how bad things could get if we took the processing of case-markers to the extreme.</p>

<h4>The proper case for execution</h4>

<p>The final difficulty in processing case-markings in Ubiquity comes from the preview and execution stages of a Ubiquity command&#8217;s usage. That is, after we parse the input, we must give the verb the arguments we found so that it can display a meaningful preview or behave correctly when executed. At this point, what case should the noun be when we hand the string of the argument to the verb?</p>

<p><a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="CAVE CANEM" target="_blank"><img src="http://farm1.static.flickr.com/120/275046371_9080289d04.jpg" alt="CAVE CANEM" border="0" /></a><br /><small><a href="http://creativecommons.org/licenses/by-sa/2.0/" title="Attribution-ShareAlike License" target="_blank"><img src="http://mitcho.com/blog/wp-content/plugins/photo-dropper/images/cc.png" alt="Creative Commons License" border="0" width="16" height="16" align="absmiddle" /></a> <a href="http://www.photodropper.com/photos/" target="_blank">photo</a> credit: <a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="Platinatore" target="_blank">Platinatore</a></small></p>

<p>For example, consider the Latin expression &#8220;cave canem!&#8221; meaning &#8220;beware the dog!&#8221;</p>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">cave              canem!
beware=imperative dog=sg.ACC</pre></div></div>


<p>Supposing for a moment that we&#8217;ve implemented the <em>cavere</em> (&#8220;beware&#8221;) verb in Ubiquity and properly parsed &#8220;cave canem,&#8221; should we pass the literal string &#8220;canem&#8221; in accusative case to the verb, or the nominative string &#8220;canis,&#8221; or the root &#8220;can-&#8220;? Which is more appropriate? If &#8220;canis&#8221; is the more appropriate choice, Ubiquity would then have to be responsible for declining the accusative into a nominative&#8230; for all case-marked languages. This is clearly a road we do not want to go down.</p>

<h3>Proposal: only support determiners and adpositions</h3>

<p>I&#8217;ve laid out three reasons why processing strongly case-marked languages in Ubiquity is a non-starter. Fortunately, languages often have multiple different strategies for accomplishing similar communicative tasks. One oft-used strategy for <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marking different roles of arguments</a> is the use of <strong>adpositions</strong> (a fancy term for prepositions and postpositions). Unlike case-markers which often are affixes on nouns, prepositions mark the beginning of an argument and postpositions the end, as is used in the current parsing strategy.</p>

<p>From a formal/theoretical perspective, adpositions sit above the noun phrase proper, while modifiers like adjectives live within the noun phrase. This reflects the fact that, with few exceptions, adpositions mark an edge of the noun phrase, which is crucial to our parsing strategy. (Here, PP is a prepositional phrase and NP is a noun phrase.) Note also that for languages such as German which marks case on determiners (D), the same logic holds.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/dcaa2cd9-4c7b-45fd-8a44-75c25b1b5561.jpg" alt="DCAA2CD9-4C7B-45FD-8A44-75C25B1B5561.jpg" border="0" width="126" height="106" style='vertical-align:middle;padding:5px;' /><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/936098e0-425b-43e1-8cec-d188d43cc942.jpg" alt="936098E0-425B-43E1-8CEC-D188D43CC942.jpg" border="0" width="170" height="134" style='vertical-align:middle;padding:5px;' /></p>

<p>Note also that, as long as the case-marking is phrase-marking  (i.e. marking the edge of the noun phrase) rather than just affixing to the head noun, our parsing strategy will work. This means we could possibly in the future write a simple RegExp to split off the Basque dative suffix, as it marks the end of the entire noun phrase. This can be seen in the following data from <a href="http://www.loria.fr/~tseng/Pubs/lsk04.pdf">Tseng (2004)</a>, where the suffix <em>-(r)i</em> affixes to the last word in the noun phrase, no matter the type of speech of that last word. (Basque is crazy cool!)</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-2.png" alt="Picture 2.png" border="0" width="539" height="66" /></p>

<h3>Conclusion</h3>

<p>In this blog I&#8217;ve outlined some reasons why it would be unreasonable or very difficult to incorporate case-marker processing into our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">current parser strategy</a>. The case markers themselves are often hard to identify, the case markers do not align at the edge of arguments, and there is the question of what form of the argument should be passed to the verb for preview and/or execution. Luckily many languages allow for adpositions (prepositions and postpositions) as an alternative strategy to case as a means of marking the different grammatical functions of arguments. By limiting Ubiquity parsing to adpositions (and case-marked determiners), I believe we are able to reach a good compromise between each user&#8217;s natural language and an easily machine-processable form.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Note that when linguists talk about &#8220;case,&#8221; they could be referring to two different (though related) concepts: case (lowercase) is the observed pattern of affixes on nouns which indicate grammatical function, while Case (uppercase) refers to a theoretical (formal) feature of syntactic objects—certain lexical items &#8220;assign Case&#8221; or &#8220;receive Case&#8221; and its mismatches were ruled out in <a href="http://en.wikipedia.org/wiki/Government and binding theory">GB</a> syntax by the Case Filter. You&#8217;ll find GB linguistics papers referring to &#8220;case&#8221; when discussing Mandarin Chinese, for example, a language that doesn&#8217;t have any overt case (lowercase) and you&#8217;ll know immediately that this usage is an uppercase Case case. In this blog post I&#8217;ll be dealing primarily with the former descriptive notion.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>When I refer to &#8220;strongly case-marking languages,&#8221; I am referring to languages with a non-trivial inventory of cases (not just nominative, accusative, and genitive) and where a noun phrase&#8217;s case is not reflected on <a href="http://en.wikipedia.org/wiki/determiner (class)">determiners</a>. For example, <a href="http://en.wikipedia.org/wiki/German language">German</a> is excluded by this definition as case is realized exclusively on articles and there is no need to find and parse the noun head itself to identify its case—more information on German is in the section &#8220;finding the edges.&#8221;&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>In reality Turkish case morphology does get a little more complicated than this with some consonants shifting as well, but it is still possible to <a href="http://www.sfs.uni-tuebingen.de/iscl/Theses/makedonski.pdf">identify Turkish case with regular expressions</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>For those of you who were curious, this difference in Polish is based on the differing genders of each of these words. Data from <a href="http://en.wikipedia.org/wiki/Polish language">Polish language</a> on Wikipedia.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:6">
<p>Example from <a href="http://en.wikipedia.org/wiki/Iʻrāb">Iʻrāb</a> on Wikipedia.&#160;<a href="#fnref:6" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:5">
<p>Thank you to <a href="http://bpick.tumblr.com/">Bailey Pickens</a> for help with the Latin data.&#160;<a href="#fnref:5" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/in-case-of-case/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Adding Your Language to Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 11:44:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case marking]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[semantic roles]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1956</guid>
		<description><![CDATA[NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions. You&#8217;ve seen the video. You speak another language. And you&#8217;re wondering, &#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221; The answer: not that hard. With [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">these instructions</a>.</strong></p>

<p>You&#8217;ve <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">seen the video</a>. You speak another language. And you&#8217;re wondering, <strong>&#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221;</strong> The answer: <strong>not that hard.</strong> With a little bit of JavaScript and knowledge of and interest in your own language, you&#8217;ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please <a href="http://ubiquity.mozilla.com/trac/ticket/662">submit your (even incomplete) language files</a>!</p>

<p><em>As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the <a href="http://ubiquity.mozilla.com/planet/">Ubiquity Planet</a> and/or <a href="http://mitcho.com/blog/">this blog</a> (<a href="http://mitcho.com/blog/feed/blog-only/">RSS</a>).</em></p>

<p><span id="more-1956"></span></p>

<h3>Set up your environment</h3>

<p>If you&#8217;re new to Ubiquity core development, you&#8217;ll want to first read the <a href="http://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">Ubiquity 0.1 Development Tutorial</a> to learn how to get a live copy of the Ubiquity repository using <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>. Once you&#8217;ve set up your Firefox profile to use this development version, make sure to try changing the <code>extensions.ubiquity.parserVersion</code> value to 2 in <code>about:config</code> (as seen in <a href="(http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/)">this demo video</a>) to verify that Parser 2 is working for you.</p>

<p>As you read along, you may find it beneficial to follow along in the languages currently included in Parser 2: <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/en.js">English</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/pt.js">Portuguese</a>, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/sv.js">Swedish</a> (and the incomplete <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a> and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/fr.js">French</a>).</p>

<h3>The structure of the language file</h3>

<p>Each language in Parser 2 gets its own file which acts as a <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>. You&#8217;ll need to look up the <a href="http://en.wikipedia.org/wiki/List of ISO 639-1 codes">ISO 639-1 code for your language</a>&#8230; Here we&#8217;ll use English (code <code>en</code>) as an example here and the JavaScript language file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.</p>

<p>Here is the basic template for a Ubiquity Parser 2 language file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> EXPORTED_SYMBOLS <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;makeEnParser&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">typeof</span> window<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #3366CC;">'undefined'</span><span style="color: #009900;">&#41;</span> <span style="color: #006600; font-style: italic;">// kick it chrome style</span>
  Components.<span style="color: #660066;">utils</span>.<span style="color: #003366; font-weight: bold;">import</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;resource://ubiquity/modules/parser/new/parser.js&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> makeEnParser<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> en <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Parser<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'en'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
...
&nbsp;
  <span style="color: #000066; font-weight: bold;">return</span> en<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>


<p>After lines 1-4 which set up the <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>, everything else is wrapped in a factory function called <code>makeLaParser</code> (for Latin) or <code>makeEnParser</code> (for English, <code>en</code>) or <code>makeFrParser</code> (for French, <code>fr</code>), etc. This function initializes the new <code>Parser</code> object (line 7) with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That&#8217;s it!</p>

<p>Now let&#8217;s walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.</p>

<h3>Identifying your branching parameter</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">branching</span> <span style="color: #339933;">=</span> <span style="color: #3366CC;">'right'</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// or 'left'</span></pre></div></div>


<p>One of the first things you&#8217;ll have to set for your parser is <strong>the <code>branching</code> parameter</strong>. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or &#8220;role marker&#8221; (most often, these are <a href="http://en.wikipedia.org/wiki/adposition">prepositions or postpositions</a>. For example, in English &#8220;from&#8221; is a delimiter for the <code>goal</code> role and its argument is on its right.</p>

<table>
<tr><td>&nbsp;</td><td>&nbsp;</td><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-right.png) no-repeat right bottom'>&nbsp;</td></tr>
<tr><td><b>to</b></td><td>Mary</td><td><b>from</b></td><td>John</td></tr>
</table>

<p>So &#8220;John&#8221; is a possible argument for the <code>source</code> role, but &#8220;Mary&#8221; should not be. Ubiquity can figure this out because English has the property <code>en.branching = 'right'</code>.</p>

<p>In Japanese, on the other hand, the argument of a delimiter like から (&#8220;from&#8221;) is found on the left of that delimiter, so <code>en.branching = 'left'</code>.</p>

<table>
<tr><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-left.png) no-repeat left bottom'>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>メアリー</td><td><b>-から</b></td><td>ジョン</td><td><b>-に</b></td></tr>
<tr><td>Mary</td><td><b>from</b></td><td>John</td><td><b>to</b></td></tr>
</table>

<p>In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.</p>

<p><strong>For more info</strong>:</p>

<ul>
<li>see <a href="http://en.wikipedia.org/wiki/Branching (linguistics)">branching</a> on Wikipedia.</li>
</ul>

<h3>Defining your roles</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">roles</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'goal'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'to'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'source'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'from'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'at'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'on'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'alias'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'as'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'using'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'with'</span><span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a> and a corresponding delimiter. Note that this mapping can be <a href="http://en.wikipedia.org/wiki/many-to-many (data model)">many-to-many</a>, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a>.</p>

<p><strong>For more info:</strong></p>

<ul>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the proposed inventory of semantic roles</a></li>
<li>Wikipedia entry on <a href="http://en.wikipedia.org/wiki/thematic relations">thematic relations</a></li>
</ul>

<h3>Entering your anaphora (&#8220;magic words&#8221;)</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">anaphora</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;this&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;that&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;it&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;selection&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;him&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;her&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;them&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The final required property is the <code>anaphora</code> property which takes a list of &#8220;magic words&#8221;. Currently there is no distinction between all the different <a href="http://en.wikipedia.org/wiki/deixis">deictic</a> <a href="http://en.wikipedia.org/wiki/anaphora (linguistics)">anaphora</a> which might refer to different things.</p>

<h3>Special cases</h3>

<p>Some special language features can be handled by overriding the default behavior from <code>Parser</code>. Many of these features are still in the works, however, so we&#8217;d love to get your comments!</p>

<h4>Languages with no spaces</h4>

<p>If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> function and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a> or <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a>.</p>

<h4>Case marking languages</h4>

<p><strike>If you have a strongly <a href="http://en.wikipedia.org/wiki/grammatical case">case-marked</a> language, you&#8217;ll have to write some rules to identify those different cases in <code>wordBreaker()</code> and then add some extra <code>roles</code> for these case markers, but for a number of languages the current design does not allow an elegant solution for parsing such arguments. Updates to this issue will be posted to <a href="http://ubiquity.mozilla.com/trac/ticket/663">this trac ticket</a>.</p>

<p>In the mean time, however, if you could write a parser even with only the prepositions/postpositions in your language, that would be a great benefit in getting started in your language.</strike> <strong>UPDATE</strong>: a proposal on how to deal with strongly case-marked languages has been written here: <a href="http://mitcho.com/blog/projects/in-case-of-case/">In Case of Case&#8230;</a>.</p>

<h4>Stripping articles</h4>

<p>Some languages have some delimiters which combine with articles. For example, in French, the preposition &#8220;à&#8221; combines with the masculine definite article &#8220;le&#8221; but not &#8220;la&#8221;:</p>

<ol>
<li>à + la = à la</li>
<li>à + le = au</li>
</ol>

<p>You can add both &#8220;à&#8221; and &#8220;au&#8221; as delimiters of the <code>goal</code> role, but then you will get feminine arguments back with the determiner (e.g. &#8220;la table&#8221;) while masculine arguments would be parsed without a determiner (e.g. &#8220;chat&#8221;).</p>

<ol>
<li>&#8220;<b>à</b> la table&#8221; = &#8220;<b>to</b> the table&#8221;</li>
<li>&#8220;<b>au</b> chat&#8221; = &#8220;<b>to the</b> cat&#8221;</li>
</ol>

<p><strike>One possible solution to this is to write a custom <code>cleanArgument()</code> method. After arguments have been parsed and placed in their appropriate roles, each argument text (say, &#8220;la table&#8221; or &#8220;chat&#8221;) are passed to <code>cleanArgument()</code>. You can simply write a <code>cleanArgument()</code> to strip off any &#8220;la &#8221; at the beginning of the input and return it and both example inputs will get normalized arguments: &#8220;table&#8221; and &#8220;chat&#8221;, respectively.</strike> <strong>UPDATE</strong>: For more up-to-date information on how to deal with these types of articles, please see <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">Solving a Romance Problem</a>.</p>

<h3>Test your parser</h3>

<p>Now you can go into <code>about:config</code> and change <code>extensions.ubiquity.language</code> to be your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> If you run into any trouble, feel free to ask for help on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity i18n listhost</a> or find me on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you&#8217;re at a good stopping point, please <a href="http://ubiquity.mozilla.com/trac/ticket/662">contribute your language file to Ubiquity</a>!</p>

<h3>More to come&#8230;</h3>

<p>At this point, you&#8217;ve only localized the <a href="http://en.wikipedia.org/wiki/argument structure">argument structure</a> of your language&#8230; additional work will be required to localize the nountypes and verb names, which is <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">the subject of ongoing discussion</a>&#8230; <a href="http://groups.google.com/group/ubiquity-i18n">join the Google Group</a> to get in on the discussion!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>At this point in time it&#8217;s also possible to test your parser at <code>chrome://parser-demo/content/index.html</code> if you make a couple other changes to your code&#8230; for more information, watch the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Foxkeh demos Ubiquity Parser TNG</a> video. This option gives you more debug info as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Scoring for Optimization</title>
		<link>http://mitcho.com/blog/observation/scoring-for-optimization/</link>
		<comments>http://mitcho.com/blog/observation/scoring-for-optimization/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 09:51:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[harmonic analysis]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1850</guid>
		<description><![CDATA[Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate&#8217;s score. You&#8217;re only interested in the top candidates. You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate&#8217;s score. You&#8217;re only interested in the top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates. <strong>You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier without sacrificing quality.</strong> Such is <a href="http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/">the problem of scoring and ranking suggestions in Ubiquity</a>. What properties must such a scoring system have?</p>

<p><em>This blog post includes a lot of complex CSS-formatted graphs which may be best viewed in — what else? — <a href="http://mozilla.com">Firefox</a>. You may also want to <a href="http://mitcho.com/blog/observation/scoring-for-optimization/">access this blog post directly</a> rather than through a planet.</em></p>

<p><style type='text/css'>
.mitchostable, .mitchostable tr, .mitchostable td, .mitchostable th {
  border:0;
  margin:0;
  padding:1px;
  background-color: transparent;
  text-align:left;
}
tr.cutoff th, tr.cutoff td { border-bottom: 1px #666 solid }
tr.cutoff td.cutoff {
  font-style: italic;
  font-size: 0.8em;
  color: #666;
  border: 0;
}
.mitchostable img { height: 7px }
.mitchostable span.bar { 
  background-color: #ccc;
  display: inline-block;
  height: 7px;
}
.mitchostable span.arrow-right { 
  background: #ccc url(http://mitcho.com/i/cccarrow-right.png) no-repeat scroll center right;
  display: inline-block;
  height: 7px;
}
.mitchostable span.arrow-left { 
  background: #ccc url(http://mitcho.com/i/cccarrow-left.png) no-repeat scroll center left;
  display: inline-block;
  height: 7px;
}
.mitchostable span.bound-right { 
  background: transparent url(http://mitcho.com/i/bound-right.png) no-repeat scroll center right;
  display: inline-block;
  height: 7px;
}
.mitchostable.threshold {
  background: transparent url(http://mitcho.com/i/000.png) repeat-y scroll 180px 0px;
}
.mitchostable.threshold2 {
  background: transparent url(http://mitcho.com/i/000.png) repeat-y scroll 70px 0px;
}
.mitchostable.threshold *, .mitchostable.threshold2 * {
  background: transparent;
}</p>

<p></style></p>

<table border='0' class='mitchostable'>

<tr><th>candidate 8</th><td><span class='bar' style='width:180px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 2</th><td><span class='bar' style='width:166px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 9</th><td><span class='bar' style='width:123px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 3</th><td><span class='bar' style='width:107px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr class='cutoff'><th>candidate 10</th><td><span class='bar' style='width:96px'>&nbsp;</span></td><td rowspan='2' class='cutoff'>CUTOFF</td></tr>

<tr><th>candidate 5</th><td><span class='bar' style='width:70px'>&nbsp;</span></td></tr>
<tr><th>candidate 1</th><td><span class='bar' style='width:50px'>&nbsp;</span></td></tr>
<tr><th>candidate 7</th><td><span class='bar' style='width:43px'>&nbsp;</span></td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>

<p>One portion of the problem description above merits clarification: I define &#8220;without sacrificing quality&#8221; to mean that, if we did not throw out any candidates early and waited until all the scores are computed fully and accurately, we would still yield the same top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> winners. This already gives us the key insight towards an appropriate solution: <em>we can only throw out candidates when we know that it has no further chance of making it up into top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates.</em></p>

<p><span id="more-1850"></span></p>

<h3>Let&#8217;s get formal</h3>

<p>Let&#8217;s call <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> the score of candidate <img src='http://s0.wp.com/latex.php?latex=C_%7Bi%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='C_{i}' title='C_{i}' class='latex' /> at time <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' /> in the derivation and we&#8217;ll assume that the score derivations are done in parallel with a unique origin (<img src='http://s0.wp.com/latex.php?latex=t%3D0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=0' title='t=0' class='latex' />).<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> We&#8217;ll use the notation <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(&#92;infty)' title='S_{i}(&#92;infty)' class='latex' /> to represent the equilibrium or final score, equal to <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> for all <img src='http://s0.wp.com/latex.php?latex=t+%3E+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t &gt; ' title='t &gt; ' class='latex' /> a certain <img src='http://s0.wp.com/latex.php?latex=t%5E%7B%5Cprime%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t^{&#92;prime}' title='t^{&#92;prime}' class='latex' /> which exists for each candidate. This function <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}' title='S_{i}' class='latex' /> thus defines a <a href="http://en.wikipedia.org/wiki/time series">time series</a> for each candidate.</p>

<p>Given a set of candidates <img src='http://s0.wp.com/latex.php?latex=%5Cleft%5C%7BC_1%2CC_2%2C%5Cldots%2CC_k%5Cright%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;left&#92;{C_1,C_2,&#92;ldots,C_k&#92;right&#92;}' title='&#92;left&#92;{C_1,C_2,&#92;ldots,C_k&#92;right&#92;}' class='latex' />, we want to find the best subset of <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates; that is, <img src='http://s0.wp.com/latex.php?latex=%5Cleft%5C%7BC_%7Bi_1%7D%2CC_%7Bi_2%7D%2C%5Cldots%2CC_%7Bi_n%7D%5Cright%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;left&#92;{C_{i_1},C_{i_2},&#92;ldots,C_{i_n}&#92;right&#92;}' title='&#92;left&#92;{C_{i_1},C_{i_2},&#92;ldots,C_{i_n}&#92;right&#92;}' class='latex' /> such that</p>

<p><center><img src='http://s.wordpress.com/latex.php?latex=%5Cdisplaystyle%20%5Cforall_%7B%20i%5Cin%20%5C%7Bi_1%2C%5Cdots%2Ci_n%5C%7D%2C%20j%5Cin%20%5C%7B1%2C%5Cdots%2Ck%5C%7D%5Csetminus%5C%7Bi_1%2C%5Cldots%2Ci_n%5C%7D%7D%20S_%7Bi%7D%28%5Cinfty%29%20%5Cgeq%20S_%7Bj%7D%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='\forall_{ i\in \{i_1,\dots,i_n\}, j\in \{1,\dots,k\}\setminus\{i_1,\ldots,i_n\}} S_{i}(\infty) \geq S_{j}(\infty)'/>.</center></p>

<h3>Approach 1: A Threshold Model</h3>

<p>The key insight above would naturally give us what I call the threshold model. Here, we require the score sequences to be non-increasing: <img src='http://s0.wp.com/latex.php?latex=%5Cforall_%7Bt+%3C+t%5E%7B%5Cprime%7D%7D+S_%7Bi%7D%28t%29+%3C+S_%7Bi%7D%28t%5E%7B%5Cprime%7D%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;forall_{t &lt; t^{&#92;prime}} S_{i}(t) &lt; S_{i}(t^{&#92;prime})' title='&#92;forall_{t &lt; t^{&#92;prime}} S_{i}(t) &lt; S_{i}(t^{&#92;prime})' class='latex' />. This way, we can naturally throw out candidates which have reached below a certain threshold <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' /> (or attained a certain level of badness, you might say) which we can then be sure will never recover.</p>

<p>For example, suppose the following diagram represents the scores of five different candidates after the first four time steps of the derivation. (The full gray bar marks the initial score (<img src='http://s0.wp.com/latex.php?latex=S_i%280%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_i(0)' title='S_i(0)' class='latex' />) and the arrows indicate the successive score differentials.) The vertical line marks the threshold, <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' />.</p>

<table border='0' class='mitchostable threshold'>
<tr><th>candidate 1</th><td><span class='bar' style='width:130px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 2</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 3</th><td><span class='bar' style='width:110px'>&nbsp;</span><span class='arrow-left' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 4</th><td><span class='bar' style='width:53px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 5</th><td><span class='bar' style='width:114px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>

<p>We can tell after four steps that candidates 2 and 4, given that the score sequences are non-increasing, have no chance to finish their derivation with a score <img src='http://s0.wp.com/latex.php?latex=%3E+M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&gt; M' title='&gt; M' class='latex' />. What is important to note, however, is that <em>candidate 4 already had no chance of beating the threshold after three steps.</em> <strong>There was no need to calculate the fourth derivation of the score of candidate 4</strong> (<img src='http://s0.wp.com/latex.php?latex=S_%7B4%7D%284%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{4}(4)' title='S_{4}(4)' class='latex' />). In other words, after three steps, we could completely take candidate 4 out of the running and after another step, take candidate 2 out of the running.</p>

<table>
<tr><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D3&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=3' title='t=3' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D4&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=4' title='t=4' class='latex' /></td></tr>
<tr>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:113px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:117px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:73px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:70px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:100px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:90px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C4</strike></th><td><span class='bar' style='width:23px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:67px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td><span class='bar' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:60px'>&nbsp;</span><span class='arrow-left' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C4</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:64px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
</tr>
</table>

<p>This non-decreasing score approach was used in Ubiquity Parser 2 until just recently, and you can in fact still play with it on the <a href="http://mitcho.com/code/ubiquity/parser-demo/">online Ubiquity Parser TNG demo</a>. In that version, every parse started with an initial score of 1 and every score factor would be a value between 0 and 1. Every score factor was multiplied onto the previous score throughout the derivation, making it trivially non-increasing.</p>

<p><strong>The problem with this approach</strong> is how to choose a smart threshold and that, given a constant threshold, you may get a different number of results for every different candidate set (i.e. parser query). If your score indicates a meaningful value with an a priori specified target of acceptable values, having a threshold makes sense. In the case of Ubiquity, however, the interface expects a certain number of suggestions to be returned.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> If we plan to display five suggestions but the parser only returns four, even though there were other candidates, there must be a very good reason and justification for that threshold value.</p>

<h3>Approach 2: Raising the Bar</h3>

<p>The problem with Approach 1 was that there was no way of guaranteeing that we would yield our predefined <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> winning candidates. Even if at some point in the derivation we are left with <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates still above the threshold, as the only restriction we have is that our score series are non-increasing, there is still a possibility that those remaining <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates&#8217; scores will drop below <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' /> later in the derivation.</p>

<p>We must instead at some point in the derivation identify <strong>(a)</strong> a set of at least <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates which will not get &#8220;worse&#8221; in the derivation and <strong>(b)</strong> candidates which have no chance of overtaking the (a) candidates. In this situation we can safely throw out the (b) candidates.</p>

<p>One way to do this is to require that all the scores <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> are <strong>bounded and non-decreasing</strong>. By virtue of being non-decreasing, our top candidates at any point in our derivation will never get &#8220;worse&#8221; afterwards, satisfying condition (a). If relatively early in the computation we can compute a bound <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' />, we can identify candidates which will never surpass the top candidates in group (a) above, satisfying condition (b).</p>

<p>In the example below, <img src='http://s0.wp.com/latex.php?latex=n%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n=2' title='n=2' class='latex' /> and the thin bars mark the upper bounds <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' />. At <img src='http://s0.wp.com/latex.php?latex=t%3D1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=1' title='t=1' class='latex' /> we can identify candidate 2 and 4 as being our top two candidates. Note that there is one candidate, candidate 5, whose upper bound <img src='http://s0.wp.com/latex.php?latex=B_5&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_5' title='B_5' class='latex' /> is less than both <img src='http://s0.wp.com/latex.php?latex=S_2%281%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_2(1)' title='S_2(1)' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_4%281%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_4(1)' title='S_4(1)' class='latex' />. By definition <img src='http://s0.wp.com/latex.php?latex=S_5%28%5Cinfty%29+%5Cleq+B_5&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_5(&#92;infty) &#92;leq B_5' title='S_5(&#92;infty) &#92;leq B_5' class='latex' /> and because the scores are non-decreasing <img src='http://s0.wp.com/latex.php?latex=S_2%281%29+%5Cleq+S_2%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_2(1) &#92;leq S_2(&#92;infty)' title='S_2(1) &#92;leq S_2(&#92;infty)' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_4%281%29+%5Cleq+S_4%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_4(1) &#92;leq S_4(&#92;infty)' title='S_4(1) &#92;leq S_4(&#92;infty)' class='latex' />. Therefore</p>

<p><center><img src='http://s.wordpress.com/latex.php?latex=S_5%28%5Cinfty%29%20%3C%20S_2%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='S_5(\infty) < S_2(\infty)'/> and <img src='http://s.wordpress.com/latex.php?latex=S_5%28%5Cinfty%29%20%3C%20S_4%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='S_5(\infty) < S_4(\infty)'/></center></p>

<p>and we can thus throw out candidate 5 at this point. By the same logic, after <img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /> we can throw candidate 2 out of the running.</p>

<table>
<tr><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=1' title='t=1' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D3&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=3' title='t=3' class='latex' /></td></tr>
<tr>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='bound-right' style='width:70px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:59px'>&nbsp;</span><span class='bound-right' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='bound-right' style='width:40px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='bound-right' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td><span class='bar' style='width:56px'>&nbsp;</span><span class='bound-right' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='arrow-right' style='width:56px'>&nbsp;</span><span class='bound-right' style='width:14px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td><span class='bar' style='width:59px'>&nbsp;</span><span class='arrow-right' style='width:5px'>&nbsp;</span><span class='bound-right' style='width:10px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='arrow-right' style='width:20px'>&nbsp;</span><span class='bound-right' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='bound-right' style='width:9px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='arrow-right' style='width:56px'>&nbsp;</span><span class='arrow-right' style='width:4px'>&nbsp;</span><span class='bound-right' style='width:10px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='arrow-right' style='width:20px'>&nbsp;</span><span class='arrow-right' style='width:15px'>&nbsp;</span><span class='bound-right' style='width:5px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='bound-right' style='width:3px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
</tr>
</table>

<p>Calling this the &#8220;raising the bar&#8221; method refers to the fact that, at any particular time <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' />, the &#8220;bar&#8221; is <img src='http://s0.wp.com/latex.php?latex=min%5Cleft%28%5Cleft%5C%7B%5Cmbox%7Bthe+%7Dn%5Cmbox%7B+greatest+%7DS_%7Bi%7D%28t%29%5Cmbox%7B+values%7D%5Cright%5C%7D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='min&#92;left(&#92;left&#92;{&#92;mbox{the }n&#92;mbox{ greatest }S_{i}(t)&#92;mbox{ values}&#92;right&#92;}&#92;right)' title='min&#92;left(&#92;left&#92;{&#92;mbox{the }n&#92;mbox{ greatest }S_{i}(t)&#92;mbox{ values}&#92;right&#92;}&#92;right)' class='latex' /> and every other candidate must have an upper bound <img src='http://s0.wp.com/latex.php?latex=B_j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_j' title='B_j' class='latex' /> greater than the bar in order to not be thrown out of consideration. This &#8220;bar&#8221; itself is, together with the component scores, non-decreasing, decreasing the number of surviving candidates over time.</p>

<p>In the case of <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">the Ubiquity parser</a> we could build such a non-decreasing and bounded scoring model by using an additive model. As the main component of parser scoring is <a href="https://ubiquity.mozilla.com/trac/ticket/435">how well the parsed arguments match the verbs&#8217; specified nountypes</a>, we could simply add up all the confidence scores of each nountype suggestion, each of which are a value between 0 and 1. This would trivially be non-decreasing. As each parse has a finite and known number of parsed arguments, we could easily determine a bound as well. For example, say a parse <img src='http://s0.wp.com/latex.php?latex=S_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_0' title='S_0' class='latex' /> has two arguments. Before we check each of the nountypes&#8217; match scores, we already know that <img src='http://s0.wp.com/latex.php?latex=S_0%28%5Cinfty%29+%5Cleq+2+%3D+B_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_0(&#92;infty) &#92;leq 2 = B_0' title='S_0(&#92;infty) &#92;leq 2 = B_0' class='latex' />.</p>

<p>Unfortunately, there are also other factors which we would like to consider in our parses which may not fit into this non-decreasing model so easily&#8230;</p>

<h3>Approach 2&#8217;: The Rising Sun Model<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></h3>

<p>One problem with both of the previous approaches is that it requires that the scoring schemes be either non-increasing or non-decreasing across the derivation. There are many situations, however, where you would want different factors to affect the score both positively and negatively. In the case of the Ubiquity parser, here are some different factors which could be good positive and negative score factors in computing the score of each parse.</p>

<table>
<tr><th>positive factors</th><th>negative factors</th></tr>
<tr><td>the verb&#8217;s specified nountype matching the argument noun well</td><td>having to suggest the verb</td></tr>
<tr><td>the verb in the input matching the verb well</td><td>multiple arguments parsed for a single <a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/'>semantic role</a></td></tr>
<tr><td>the verb being used often</td><td>the verb missing some arguments</td></tr>
</table>

<p>As we see, there are both positive and negative factors which we hope to consider in scoring our possible Ubiquity parses. They key to making this work is by noting that Approach 2 only requires that the scoring series be bounded and non-decreasing <em>after a certain known time in the derivation</em>. For example, even if a parse involves a number of decreases early in the parse derivation, if after a certain point we can be certain that it is non-decreasing and bounded, we can simply use that bound and start eliminating poor candidates at that time (in this example, after <img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' />).</p>

<p><style type='text/css'>
.mitchostable2, .mitchostable2 tr, .mitchostable2 td, .mitchostable2 th {
  border:0;
  margin:0;
  padding:1px;
  text-align:left;
  vertical-align: bottom;
}
.mitchostable2 {
  background: transparent url(http://mitcho.com/i/000.png) repeat-x 0px 57px;
}
.mitchostable2 * {
  background: transparent;
}
.mitchostable2 span.bar { 
  background-color: #ccc;
  display: inline-block;
  width: 7px;
}
</style></p>

<table border='0' class='mitchostable2'>
<tr>
<td><span class='bar' style='height:150px'>&nbsp;</span></td><td><span class='bar' style='height:120px'>&nbsp;</span></td><td><span class='bar' style='height:90px'>&nbsp;</span></td><td><span class='bar' style='height:50px'>&nbsp;</span></td><td><span class='bar' style='height:60px'>&nbsp;</span></td><td><span class='bar' style='height:72px'>&nbsp;</span></td><td><span class='bar' style='height:80px'>&nbsp;</span></td><td><span class='bar' style='height:82px'>&nbsp;</span></td><td><span class='bar' style='height:90px'>&nbsp;</span></td><td><span class='bar' style='height:92px'>&nbsp;</span></td><td><span class='bar' style='height:92px'>&nbsp;</span></td><td><span class='bar' style='height:93px'>&nbsp;</span></td><td><span class='bar' style='height:93px'>&nbsp;</span></td><td><span class='bar' style='height:94px'>&nbsp;</span></td>
</tr>
<tr><td>0</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>5</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td colspan="3">10</td><td>&nbsp;</td></tr>
</table>

<p>This is very much possible in the Ubiquity parser as, given the <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Ubiquity Parser 2 design</a>, the negative factors such as whether the parse has a verb from the input or not (step 2), whether multiple arguments are identified with the same semantic role (step 4), and how many of the verb&#8217;s arguments are in the input (step 4) can be identified early on in the derivation, all before the very computationally intensive step of nountype detection (step 7) and argument suggestion (step 8). In this way, we can front-load all the negative factors in scoring and continue to use a version of Approach 2 to optimize our parsing.</p>

<p>We can moreover make the effect of the negative factors be felt across the entire derivation by figuring the negative factors into a factor between 0 and 1 and multiplying it onto each of the positive factors being added. In other words, we can compute all the negative factors into a single <strong>score multiplier</strong> <img src='http://s0.wp.com/latex.php?latex=%5Cmu_i+%5Cin+%5B0%2C1%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_i &#92;in [0,1]' title='&#92;mu_i &#92;in [0,1]' class='latex' /> earlier in the derivation and then afterwards when adding up each of the positive factors simply applying that score multiplier to the score derivation:</p>

<p><center><img src='http://s0.wp.com/latex.php?latex=%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+0%7D%29+%2B+%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+1%7D%29+%2B+%5Cldots+%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+%7Dm%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_{i}(&#92;mbox{positive factor 0}) + &#92;mu_{i}(&#92;mbox{positive factor 1}) + &#92;ldots &#92;mu_{i}(&#92;mbox{positive factor }m)' title='&#92;mu_{i}(&#92;mbox{positive factor 0}) + &#92;mu_{i}(&#92;mbox{positive factor 1}) + &#92;ldots &#92;mu_{i}(&#92;mbox{positive factor }m)' class='latex' />.</center></p>

<p>This model is what is going on <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/2bc28033a723/ubiquity/index.html#modules/parser/tng/parser.js">under the hood</a> in <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">Ubiquity Parser 2</a>. The <code>Parser.Parse</code> class has a property called <code>.scoreMultiplier</code> which contains the score multiplier <img src='http://s0.wp.com/latex.php?latex=%5Cmu_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_i' title='&#92;mu_i' class='latex' /> as described above. A method called <code>.getMaxScore()</code> is implemented in addition to <code>.getScore()</code> so that, even before all of the nountype suggestion scores have been computed (e.g., in the case of asynchronous suggestions) <code>.getMaxScore()</code> can be used as an upper bound <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' /> and compared to the in-progress scores of other candidates and lower candidates can thus be taken out of consideration earlier in the parse process.</p>

<h3>Conclusion</h3>

<p>In this blog post I&#8217;ve laid out a few different iterations of approaches I&#8217;ve thought of on the problem of scoring and ranking Ubiquity suggestions in a smart way. While some of the basic mechanisms of front-loading the negative factors into a <code>scoreMultiplier</code> and the computation of the <code>maxScore</code> (or upper bound) have been implemented, the actual optimization algorithm described here of removing parses from consideration earlier in the parser query has yet to be implemented in Ubiquity Parser 2 and I look forward to seeing it in action. In addition, there are surely factors I haven&#8217;t considered in the scoring or further tricks to improve the optimized scoring algorithm. <strong>I&#8217;d love to get your feedback and ideas on this topic.</strong> Thanks!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>In the case of Ubiquity Parser 2, we&#8217;ll let the &#8220;time&#8221; values <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' /> refer to the &#8220;steps&#8221; in the derivation, as laid out in <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">the Ubiquity Parser 2 design</a>. Note that these &#8220;steps&#8221; are currently done in parallel across all candidates in the current architecture, making the &#8220;time&#8221; analogy legitimate. I will thus use integer time values here, making this a <a href="http://en.wikipedia.org/wiki/discrete-time">discrete-time</a> model.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>Every Ubiquity parser query takes as a parameter the maximum number of suggestions to be returned. See <a href="https://ubiquity.mozilla.com/trac/ticket/532">the latest parser query interface proposal</a> for details on this interface.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>This naming is an homage to the <a href="http://en.wikipedia.org/wiki/rising sun lemma">rising sun lemma</a> of <a href="http://en.wikipedia.org/wiki/Frigyes Riesz">Frigyes Riesz</a> which uses a similar logic. The apparent connection to the fact that I am Japanese is purely coincidental.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-for-optimization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A Demonstration of Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 06:45:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1845</guid>
		<description><![CDATA[Here&#8217;s a quick demonstration of Ubiquity Parser 2, aka &#8220;the new parser.&#8221; I&#8217;ll show you how you can use the parser yourself and point out some highlights of the new functionality. Ubiquity Parser 2: better noun-first suggestions and command localization from mitcho on Vimeo. Testing Parser 2 requires the latest Ubiquity source, as explained here. [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quick demonstration of Ubiquity Parser 2, aka &#8220;the new parser.&#8221; I&#8217;ll show you how you can use the parser yourself and point out some highlights of the new functionality.</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=4307110&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=4307110&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/4307110">Ubiquity Parser 2: better noun-first suggestions and command localization</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p><span id="more-1845"></span></p>

<p>Testing Parser 2 requires the latest Ubiquity source, as explained <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">here</a>. If you find any problems or suggestions, please add a ticket to <a href="http://ubiquity.mozilla.com/trac/">our trac</a> with the keyword <code>new-parser</code>.</p>

<p>Here are some resources for those of you who would like to read more about different features touched on in this video:</p>

<ul>
<li><a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">The design document for the new parser</a></li>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a> and <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">a proposed inventory of semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">Some thoughts on noun-first suggestions and Ubiquity in Japanese</a></li>
</ul>

<p>In the near future we&#8217;ll also be writing up some documentation on how to take advantage of this new parser in your commands as well.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Attachment Ambiguity—or—when is the gyudon cheap?</title>
		<link>http://mitcho.com/blog/observation/attachment-ambiguity/</link>
		<comments>http://mitcho.com/blog/observation/attachment-ambiguity/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 06:17:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[observation]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[attachment ambiguity]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[Japanese culture]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[syntax]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1815</guid>
		<description><![CDATA[Every day on the way to work I walk by a fine establishment known as Yoshinoya (吉野家), Japan&#8217;s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by gyudon, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya.jpg" alt="yoshinoya.jpg" border="0" width="650" height="328" /></p>

<p>Every day on the way to work I walk by a fine establishment known as <a href="http://en.wikipedia.org/wiki/Yoshinoya">Yoshinoya</a> (吉野家), Japan&#8217;s largest <em>gyudon</em> (牛丼) chain restaurant. For those of you whose lives have yet to be graced by <a href="http://en.wikipedia.org/wiki/gyudon">gyudon</a>, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.</p>

<p>Part of this sign, though, made me think about our <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">new Ubiquity parser</a>. In particular, it was the <strong>attachment ambiguity</strong> in the end date of the promotion. The text in the photo above literally is &#8220;April 15th (Wed.) 8PM until&#8221;. (Note that Japanese is a strongly head-final language, and that the &#8220;until&#8221; is a postposition.) There are two possible readings for this expression, as illustrated by the two <a href="http://en.wikipedia.org/wiki/principle of compositionality">composition</a> trees below.</p>

<p><span id="more-1815"></span></p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya-trees.jpg" alt="yoshinoya-trees.jpg" border="0" width="658" height="157" /></center></p>

<p>The first tree, on the left, represents the reading &#8220;until (April 15th 8PM)&#8221;, while the second represents two arguments: &#8220;on April 15th&#8221; and &#8220;until 8PM&#8221;. In other words, in the first reading, the promotion begins at some earlier date and extends until April 15th at 8PM while, in the second reading, the promotion is one day only, on April 15th, until 8pm. Such syntactic ambiguities are called &#8220;attachment ambiguities&#8221; in linguistics as it is an ambiguity of where different arguments &#8220;attach&#8221; in a tree representation.</p>

<p>This attachment ambiguity was possible because there was no clear <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marker</a> on &#8220;April 15th,&#8221; which may have disambiguated it as &#8220;on April 15th&#8221;. In fact, in many languages this time position argument comes with no case marker or preposition, or it&#8217;s optional, making parsing for them difficult. If such a sentence is entered with spaces, the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Ubiquity Parser: The Next Generation</a> would try a parse where &#8220;8PM&#8221; is the &#8220;until&#8221; or <code>goal</code> argument and &#8220;April 15th&#8221; is an <code>object</code> argument, but it will only check its noun type, not put it in <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the correct semantic role</a> (<code>position</code>). Perhaps this is something to think about in the future.</p>

<p>These types of situations will surely come up as we continue work on the Ubiquity parser, making it essential to look at different languages. <strong>Are there certain kinds of arguments in your language that do not have any word-external markers such as case or prepositions/postpositions?</strong></p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/attachment-ambiguity/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Rolling out the Roles</title>
		<link>http://mitcho.com/blog/projects/rolling-out-the-roles/</link>
		<comments>http://mitcho.com/blog/projects/rolling-out-the-roles/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 07:07:27 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1763</guid>
		<description><![CDATA[Jono and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands with semantic roles. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Jono and I have recently been working to incorporate the <a href="http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/">Parser The Next Generation</a> into Ubiquity proper, and this of course involves the process of <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">retooling the standard commands with semantic roles</a>. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use and individual languages&#8217; parsers will be built to identify. Today I have just such a proposal.</p>

<p><span id="more-1763"></span></p>

<h3>Something to consider&#8230;</h3>

<p>As we rewrite these current commands to specify semantic roles instead of specific modifiers, it is important to distinguish between synonymous prepositions in English which actually map to different semantic roles. Here are two examples:</p>

<ul>
<li><code>with</code>: English &#8220;with&#8221; can refer to one of two relations: &#8220;together-with&#8221; as in &#8220;share this with Jono&#8221; and &#8220;using-with&#8221; as in &#8220;share this with delicious&#8221; or &#8220;eat this with a fork.&#8221;</li>
<li><code>in</code>: &#8220;in&#8221;, similarly, can refer to two different relations: &#8220;location-in&#8221; as in &#8220;find mexican food in Tokyo&#8221; and &#8220;format-in&#8221; as in &#8220;search Moscow in Russian&#8221; or &#8220;save this page in PDF.&#8221;</li>
</ul>

<p>A quick test for such cases is &#8220;would these markers translate to the same markers in a different language?&#8221; It&#8217;s easy to find a language where the two different &#8220;with&#8221;s and the two different &#8220;in&#8221;s are expressed using different words. <em>With semantic roles in Parser TNG, it&#8217;s okay for multiple semantic roles to share the same delimiters/markers.</em></p>

<h3>A proposed set of semantic roles</h3>

<p>Here is a set of semantic roles which I would like to propose. <em>Keep in mind that these roles should map to morphological features in languages, not necessarily to the type of content in the argument (which is why we also will keep the noun types).</em></p>

<ul>
<li><code>object</code>: direct object (the default or unmarked argument)</li>
<li><code>goal</code>: the goal or end point of (metaphorical) movement or transition

<ul>
<li>example: in English, arguments marked by &#8220;to&#8221;, &#8220;into&#8221;, &#8220;toward&#8221;, etc.</li>
</ul></li>
<li><code>source</code>: the source or starting point of (metaphorical) movement or transition<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>

<ul>
<li>example: in English, arguments marked by &#8220;from&#8221;, &#8220;by&#8221;, etc.</li>
</ul></li>
<li><code>position</code>: refers to a (metaphorical) location which defines the scope of an action, in contrast to <code>goal</code> and <code>source</code>.

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221;, &#8220;at&#8221;, &#8220;near&#8221;, etc.</li>
</ul></li>
<li><code>instrument</code>: a tool or intermediary to be used 

<ul>
<li>example: in English, arguments marked by &#8220;using&#8221; or &#8220;with&#8221;, as in &#8220;bookmark this with delicious.&#8221;</li>
</ul></li>
<li><code>format</code>: describes the intended or expected form of the result

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221; as in &#8220;in PDF form&#8221; or &#8220;in German&#8221;</li>
</ul></li>
<li><code>alias</code>: a name or reference to 

<ul>
<li>example: in English, arguments marked by &#8220;as&#8221; as in &#8220;tag this as new&#8221; or &#8220;login to mail as aza.&#8221;</li>
</ul></li>
</ul>

<p>Note that all three locational roles, <code>goal</code>, <code>source</code>, and <code>location</code> may be used for both times and places as the morphological marking of temporal and spacial expressions are often conflated in language. The appropriate type of referent (time or space) can then be specified with the noun type.</p>

<p>As a quick sanity check of this proposal, here are all the standard feeds built into Ubiquity which have multiple arguments together with what semantic role is appropriate for each argument:</p>

<table>
<thead>
<tr><th>command</th><th>current modifier</th><th>semantic role</th>
</thead>
<tbody style='font-family: monospace'>
<tr><th>convert</th><td>to</td><td>goal, format</td></tr>
<tr><th>email</th><td>to</td><td>goal</td></tr>
<tr><th rowspan='2'>translate</th><td>to</td><td>goal, format</td></tr>
<tr><td>from</td><td>source</td></tr>
<tr><th>search</th><td>with</td><td>instrument</td></tr>
<tr><th>wikipedia</th><td>in</td><td>format</td></tr>
<tr><th>yelp</th><td>near</td><td>position</td></tr>
<tr><th>weather</th><td>in</td><td>location</td></tr>
<tr><th>twitter</th><td>as</td><td>alias</td></tr>
<tr><th rowspan='2'>share-on-delicious</th><td>tagged</td><td>alias</td></tr>
<tr><td>entitled</td><td>alias</td></tr>
</tbody>
</table>

<p>The only problematic standard command, then, is the <code>share-on-delicious</code> command which can take both tags and a title, both of which would most naturally correspond to the <code>alias</code> role. <strong>If you have a suggestion for how best to deal with this type of case, I&#8217;d love to hear your suggestions!</strong></p>

<p>We&#8217;d love to get your feedback to this proposed set of semantic roles. <strong>How do you feel about the proposed set of semantic roles laid out here?</strong> In particular, if you have a command or can envision a command which would like to use a semantic role which does not fit any of these roles or would take multiple arguments of the same role, please let us know! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>The <a href="http://scholar.google.com/scholar?q=&quot;types+of+lexical+information&quot;+fillmore">Filmore (1971)</a> semantic role of &#8220;result&#8221; may also be lumped into this.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/rolling-out-the-roles/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Scoring and Ranking Suggestions</title>
		<link>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/</link>
		<comments>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 07:17:26 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[constraints]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[Optimality Theory]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1745</guid>
		<description><![CDATA[I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I&#8217;d put some of these thoughts down in writing. The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> so I thought I&#8217;d put some of these thoughts down in writing.</p>

<p>The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four &#8220;scores&#8221; for each suggestion:</p>

<p><span id="more-1745"></span></p>

<ol>
<li><code>duplicateDefaultMatchScore</code>: 100 by default—lowered if an unused argument gets multiple suggestions (in <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/0aaeae361c33/ubiquity/modules/parser/parser.js#l558">the words of the code</a>: &#8220;reduce the match score so that multiple entries with the same verb are only shown if there are no other verbs.&#8221;)</li>
<li><code>frequencyMatchScore</code>: a score from the <code>suggestion memory</code> of the frequency of the suggestion&#8217;s verb, given the input verb (currently the first word) or nothing, in the case of noun-first suggestions</li>
<li><code>verbMatchScore</code>: float in [0,1]: (as described <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_Documentation#Scoring_the_Quality_of_the_Verb_Match">here</a>)

<ul>
<li>0.75 is returned in case there it is a noun-first suggestion (by virtue of the fact that <code>String.indexOf('')==0</code>)</li>
<li>1 if the verb name is equivalent across input-output</li>
<li>in [0.75,1) if the input is a prefix of the suggestion verb name</li>
<li>in [0.5,0.75) if the input is a non-prefix substring of the suggestion verb</li>
<li>in [0.25,0.5] if the input is a prefix of one of the <code>synonyms</code></li>
<li>in [0,0.25) if the input is a non-prefix substring of one of the <code>synonyms</code></li>
</ul></li>
<li><code>argMatchScore</code>: the number of arguments with matching &#8220;specific&#8221; nountypes, where &#8220;specific&#8221; is designated by the nountype having property <code>rankLast=false</code>.</li>
</ol>

<p>With the numeric scores for each of these criteria, a partial order of suggestions is constructed using a <a href="http://en.wikipedia.org/wiki/lexicographic order">lexicographic order</a>: that is, compare candidates first using <code>duplicateDefaultMatchScore</code>, break ties using <code>frequencyMatchScore</code>, if still tied break using <code>verbMatchScore</code>, and if still tied break using <code>argMatchScore</code>. This paradigm of constraints is called &#8220;strictly ranked&#8221; and a corollary of this is that lower constraints, no matter how well you score on them, can never overcome a loss at a higher constraint. A crucial corollary of this system is that lower constraints&#8217; scores need not be computed if a higher constraint already dooms it to a lower position.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<h3>Ranking in The Next Generation</h3>

<p>One of the goals of <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> is to make noun/argument-first input first-class citizens of Ubiquity, improving their suggestions in particular to the benefit of <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a>. Arguments will be split up and tested against different noun types before a verb is even entered into the input, in which case target verbs can be ranked according to the appropriateness of the input&#8217;s arguments. As such, I believe the <code>argMatchScore</code> criteria above should either be ranked higher in a strictly ranked model or be allowed to overtake lower scores for the higher constraints in a non-strictly ranked model.</p>

<p>The <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> proposal and <a href="http://mitcho.com/code/ubiquity/parser-demo">demo</a> currently orders using a product of various criteria&#8217;s scores, rather than a lexicographic order of strictly ranked constraints. The component factors are:</p>

<ol>
<li><code>0.5</code> for parses where the verb was suggested</li>
<li><code>0.5</code> for each extra (>1) <code>object</code> argument (essentially &#8220;unused words&#8221; in the previous parser)</li>
<li>the score of each argument against that semantic role&#8217;s target noun type</li>
<li><code>0.8</code> for each unset argument of that verb</li>
</ol>

<p>Each component score is a value in [0,1], so the score is always non-decreasing across the derivation. This offers a natural way to optimize the candidate set creation: if a possible parse ever gets a score below a magic &#8220;threshold&#8221; value, it is immediately thrown away.</p>

<p>A possible problem with the current Parser TNG scoring model is that it will implicitly hinder verbs and parses with more arguments as it could have more sub-1 noun type score factors—this consideration may be great enough that a weighted additive model should be considered over a multiplicative one.</p>

<p><strong>How do you think we can make Ubiquity&#8217;s suggestion ranking smarter? What other factors should be considered, and what factors could be left out?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>For all the linguists in the audience, if this sounds like <a href="http://en.wikipedia.org/wiki/Optimality Theory">Optimality Theory</a>, you would be right—there&#8217;s a little bit of <a href="http://roa.rutgers.edu/view.php3?roa=537">Prince and Smolensky (1993)</a> hanging out <a href="http://ubiquity.mozilla.com">in your browser</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

