<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; algorithm</title>
	<atom:link href="http://mitcho.com/blog/tag/algorithm/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Exploring Command Chaining in Ubiquity: Part 1</title>
		<link>http://mitcho.com/blog/projects/exploring-command-chaining-in-ubiquity-part-1/</link>
		<comments>http://mitcho.com/blog/projects/exploring-command-chaining-in-ubiquity-part-1/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 20:35:37 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[asynchronous]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2760</guid>
		<description><![CDATA[Since the dawn of time people have been asking about command chaining in Ubiquity. If you have a translate command and an email command, it would be great to be able to, for example, translate hello to Spanish and email to Juanito. This is what we call command chaining or piping: in a single complex [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/link/command-chaining-with-oni/' rel='bookmark' title='Command Chaining with Oni?'>Command Chaining with Oni?</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Since the <a href="http://labs.mozilla.com/2008/08/introducing-ubiquity/">dawn of time</a> people have been asking about command chaining in Ubiquity. If you have a <code>translate</code> command and an <code>email</code> command, it would be great to be able to, for example, <code>translate hello to Spanish and email to Juanito</code>. This is what we call <strong>command chaining</strong> or <strong><a href="http://en.wikipedia.org/wiki/Pipeline_(Unix)">piping</a></strong>: in a single complex query, specifying multiple (probably two) actions and using the first&#8217;s output as the second&#8217;s input.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Today I hope to cover some of the technical considerations required in implementing command chaining in Ubiquity, and I will follow up soon with a blog post on the linguistic considerations required as well.</p>

<p><span id="more-2760"></span></p>

<h3>Technical considerations: hooking the pipes together</h3>

<p>I&#8217;d first like to lay out some technical challenges and questions. These can be broken into two different categories: (1) how the parse and display of suggestions is affected and (2) how the execution is affected.</p>

<h4>Matching inputs and outputs</h4>

<p>We&#8217;ll first consider how command chaining may affect the parsing. Ubiquity commands each specify the types of argument inputs that it expects using different <strong>noun types</strong>, such as <code>noun_arb_text</code> which accepts anything, <code>noun_type_number</code> which accepts numbers, or <code>noun_type_language</code>, which takes the name of a language. For example, the <code>translate</code> verb takes maximally three arguments: a <code>noun_arb_text</code> object, a <code>noun_type_language</code> goal (the language to translate into), and a <code>noun_type_language</code> source (the source language). In implementing command chaining, it will be necessary to identify the appropriate noun types for the <em>output</em> of a command.</p>

<p>The first question we must address here is <strong>&#8220;what is the chaining output of a command&#8221;?</strong> Is it the preview text? Some text output from the execution?</p>

<p><a href='http://www.flickr.com/photos/joemud/2851415655/'><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/2851415655_1012a4cce0_o.jpg" alt="2851415655_1012a4cce0_o.jpg" border="0" width="650" height="226" /></a><br/><small><a href='http://www.flickr.com/photos/joemud/2851415655/'>Big fish eat da lil fish</a> by joemud, CC-SA-NC</small></p>

<p>To put this question into perspective, we note that Ubiquity commands can be broadly classified into two types: <strong>lookup</strong> and <strong>action execution</strong>. Here&#8217;s a classification which I believe to be exhaustive:<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></p>

<table>
<tr><th>classification</th><th>preview</th><th>execution</th><th>example</th></tr>
<tr><td rowspan='4'>lookup</td><td rowspan='4'>data lookup</td><td>inserting result into page</td><td><code>translate</code></td></tr>
<tr><td>opening a website</td><td><code>weather</code>, most search commands</td></tr>
<tr><td>copying result to pasteboard</td><td><code>get email address</code></td></tr>
<tr><td>nothing</td><td><code></code></td></tr>
<tr><td>action</td><td>nothing (maybe a description<br/>of what the action will do)</td><td>an action which changes some state <br/>(in the browser or on the web)</td><td><code>quit firefox</code>, <code>email</code>, <code>twitter</code></td></tr>
</table>

<p>In light of this classification I believe we can say that lookup commands are much more likely to be the first verb in a command chain—conversely, chains such as <code>email hello to Blair and then do ...</code> or <code>twitter hello and then ...</code> are quite unlikely. What is much more likely is for the first verb to be a lookup function.</p>

<table>
<tr><th>first verb type</th><th>second verb type</th><th>example</th></tr>
<tr><td>lookup</td><td>action</td><td><code>translate this to Spanish and email to Aza</code></td>
<tr><td>lookup</td><td>lookup</td><td><code>translate this to English and then find it with Amazon</code></td>
<tr><td>action</td><td>action/lookup</td><td><i>no use case?</i></td>
</table>

<p>Thus in the same way that not all commands have a useful execution<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> <strong>perhaps only lookup commands will have a chainable output: the results of the lookup.</strong> Even with this restriction, we will most likely need to implement a new &#8220;chainable output&#8221; method or getter in these commands. This means that commands will need to opt-in to become chainable, but I believe this is a necessary evil.</p>

<p>The second question we must address is <strong>&#8220;when do we establish the noun type of a command&#8217;s chainable output?&#8221;</strong> One unsung but crucial feature of the way Ubiquity works now is that suggestions&#8217; previews are not computed until that suggestion is selected (except for the first suggestion, which in most skins gets previewed immediately). Should we wait for all of the first verbs&#8217; chainable output to be computed and then run them through the <a href="http://mitcho.com/blog/projects/judging-noun-types/">noun type detection system</a>? Or should verbs with chainable output also <em>a priori</em> specify what noun types their output will be?</p>

<p>Both of these approaches have their problems. If we compute the chainable output of the first verb, run a noun type detection on it and <em>then</em> suggest the full combination if it matches what the second verb was expecting, this will have clear performance implications, not to mention that it could greatly complicate our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">parsing algorithm</a>. While the latter approach doesn&#8217;t have these performance implications, it does mean that it will have to list (by name or reference) the noun types that will match its output, meaning that if a command author is unaware of someone else&#8217;s noun type, that chain will be impossible, even if the chainable output itself does indeed match that noun type. The <em>a posteriori</em> approach would never have this issue. <strong>What other benefits or problems do you forsee with either of these approaches? Is there another approach which avoids these pitfalls?</strong></p>

<h4>(A)synchronous composability</h4>

<p>Once we have the noun types, parsing, and suggestions down, all that remains is to compute the previews and implement the composite execution. Since the Ubiquity command manager already wraps the preview and execute functions in a wrapper to facilitate localization, among other uses, it would be easy to make the command manager <a href="http://www.croczilla.com/blog/16">compose asynchronous processes pseudo-synchronously</a>. No major changes should be necessary to do the previews and executions, though, again, there will be a performance cost.</p>

<h3>Conclusion</h3>

<p>There are a number of technical questions which must be answered, mostly in the parsing/suggesting stage. The key questions to answer are:</p>

<ol>
<li>What is the chaining output of a command?</li>
<li>When do we establish the noun type of a command&#8217;s chainable output?</li>
</ol>

<p>I&#8217;ll make another post soon on the linguistic considerations necessary in making command chaining happen in a <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">natural</a> fashion.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>We&#8217;re going to limit our discussion here to this restriction that the two verbs are not simply two simultaneous commands, but two commands which operate successively on an input, i.e., that it is true piping. This for example rules out input such as <code>google dogs and translate cat to Spanish</code>, as the second command&#8217;s execution does not semantically depend on the first&#8217;s execution. This (hopefully uncontroversial) decision also affects the linguistic considerations to be made in my next post.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>If you know of a command which doesn&#8217;t neatly fit into &#8220;lookup&#8221; or &#8220;action&#8221;, please let me know.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>I believe we should mark these no-execution lookup commands visually so the user does not expect anything to happen if they execute it. This is <a href="http://ubiquity.mozilla.com/trac/ticket/651">trac #651</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/link/command-chaining-with-oni/' rel='bookmark' title='Command Chaining with Oni?'>Command Chaining with Oni?</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/exploring-command-chaining-in-ubiquity-part-1/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 3: Geo Day</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-3/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-3/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 04:20:22 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2647</guid>
		<description><![CDATA[It&#8217;s time for one more installment of Nountype Quirks, where I review and tweak Ubiquity&#8217;s built-in nountypes. For an introduction to this effort, please read Judging Noun Types and my updates from Day 1 and Day 2. Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>It&#8217;s time for one more installment of Nountype Quirks, where I review and tweak <a href="http://ubiquity.mozilla.com">Ubiquity</a>&#8217;s built-in nountypes. For an introduction to this effort, please read <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a> and my updates from <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">Day 1</a> and <a href="http://mitcho.com/blog/projects/nountype-quirks-day-2/">Day 2</a>.</p>

<p>Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements to the geolocation-related nountypes whose plans I lay out here.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-3/">on my site</a>.</em><span id="more-2647"></span></p>

<h3><code>noun_type_geolocation</code></h3>

<p><code>noun_type_geolocation</code> is the nountype used by the <code>weather</code> command for its location argument in input like &#8220;weather near Chicago&#8221;. The neat feature of <code>noun_type_geolocation</code> is that it has a smart default value which uses Firefox&#8217;s geolocation system to give you your current location by default, so I can enter &#8220;weather&#8221; and get the suggestion &#8220;weather near Broomfield, Colorado&#8221; (not completely correct, but close enough for the weather). Otherwise, however, <code>noun_type_geolocation</code> does not do too hot&#8230; for any input you give it, it&#8217;ll just accept it with a score of 0.3, much like <code>noun_arb_text</code>. We could do better.</p>

<p>One issue with this <code>noun_type_geolocation</code> is a conceptual one. Is this nountype supposed to accept only municipalities? Countries? Or should it accept landmarks or addresses as well? Part of the issue is that it&#8217;s only used by one built-in command in Ubiquity now, <code>weather</code>. But to be called a general &#8220;geolocation&#8221; nountype, its output should not be specific to <code>weather</code>&#8217;s usage, which is to throw the result at the <a href="http://wunderground.com">Weather Underground</a> API.</p>

<p>I propose that we change this to be something like <code>noun_type_geo_town</code> and also make similar nountypes like <code>noun_type_geo_country</code>, <code>noun_type_geo_region</code>, going all the way down to <code>noun_type_address</code> (which already exists—see below). All of the nountypes in this family could use a geocoding API such as <a href="http://code.google.com/apis/maps/documentation/geocoding/index.html">Google&#8217;s</a> or <a href="http://developer.yahoo.com/maps/rest/V1/geocode.html">Yahoo&#8217;s</a>. Their <code>data</code> properties could include all of this geocoded geographic data (in English) and also the latitude/longitude coordinate data.</p>

<p>The <code>weather</code> command could then accept <code>noun_type_geo_town</code> but, as some municipalities are not in Weather Underground or, for some countries, it is only as granular as administrative districts, we could just display the results of the geocoding API but then give Weather Underground the geocoded latitude/longitude data.</p>

<h3><code>noun_type_async_address</code></h3>

<p><code>noun_type_async_address</code> attempts to do exactly what I&#8217;ve laid out above for the most granular level: that of geolocations with data all the way down to the street level. This is the nountype which is used for the built-in <code>map</code> command and uses the <a href="http://developer.yahoo.com/maps/rest/V1/geocode.html">Yahoo geocoding service</a> to accomplish this. Let&#8217;s see what kinds of results it returns:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>mitcho</td><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='1'>grenada</td><td>grenada</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>jono</td><td>jono</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>mountain view</td><td>mountain view</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
</table>

<p>Let&#8217;s lay out some immediate quirks:</p>

<ol>
<li>All scores are either 0.5 or 0.9. In general, if the Yahoo API returns some geocoded interpretation, it gets 0.9, but otherwise it accepts everything with 0.5.</li>
<li>The results that came back from the Yahoo service doesn&#8217;t add any useful information like the country or administrative region. Even the case stays lowercase.</li>
<li>Since when is Jono a location!? I&#8217;ll get back to this later.</li>
</ol>

<p>For starters, the Yahoo! Maps API terms of service dictate that we can&#8217;t use its geocoding service if we&#8217;re not also displaying Yahoo maps, so I rewrote it using the Google API which also had the advantage of offering JSON output.</p>

<p>One quirk of the Google Geocoding API, though, is that all of the resulting municipality names are only in English. Try for example queries for <a href="http://maps.google.com/maps/geo?q=Wien&amp;output=json&amp;oe=utf8&amp;sensor=false">Wien</a> or <a href="http://maps.google.com/maps/geo?q=%E6%9D%B1%E4%BA%AC&amp;output=json&amp;oe=utf8&amp;sensor=false">東京 (Tokyo)</a>. Since we want our suggestions to only add information to our input, not replace the input entirely (and especially not in another language), we&#8217;ll then only take results which have the input as an initial substring. On the other hand, if none of the results have the input as a proper prefix of the return value, we will take the geocoding information from the first result but with the original input as the display text. Such results will have a markedly lower score.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>As this is the <code>address</code> nountype, we&#8217;ll penalize results which do not have detailed information such as street address or town-level information. All of this is very easy to judge as every result from the API has a <a href="http://code.google.com/intl/ja/apis/maps/documentation/geocoding/index.html#GeocodingAccuracy">geocoding accuracy</a> value.</p>

<h3>The best laid plans of mice and men&#8230;</h3>

<p>I spent a good few hours this afternoon and evening <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/377daf3fe57a">attempting to implement</a> this new family of nountypes, including this new <code>nountype_geo_address</code>, but also <code>nountype_geo_subregion</code>, <code>nountype_geo_region</code>, and <code>nountype_geo_country</code>. Some of the quirks of the <code>weather</code> and <code>map</code> commands, however, have prevented me from completely replacing the legacy <code>noun_type_address</code> and <code>noun_type_geolocation</code> described above. I hope to continue this work again soon and actually make this transition, ideally before 0.5.2.</p>

<p>Look forward to one (or maybe two?) more episode(s) of Nountype Quirks where I hope to definitively explain, analyze, and tweak <code>matchScore</code>, the scoring algorithm which underlies the majority of the nountypes in Ubiquity. As always, I look forward to your comments and feedback.</p>

<h3>Bonus: Where&#8217;s Jono?</h3>

<p>It turns out that <code>noun_type_async_address</code> was recognizing &#8220;Jono&#8221; as an address because Jono is actually a location afterall! Not only that, but Jono is in Japan!!</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/08/Picture-31.png" alt="Picture 3.png" border="0" width="594" height="525" /></p>

<p>You clearly <a href="http://jonoscript.files.wordpress.com/2009/06/ubiquity_japanese.png">can&#8217;t take Japan out of Jono</a>, but it turns out you can&#8217;t take Jono out of Japan either.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>If this crazy algorithm raises a red flag for anyone, you&#8217;re not alone&#8230; if you think of a more elegant solution, please let me know. This will no doubt be an issue when it comes to localizing the <code>address</code> nountype as well. I wish we could specify an output language for the Google Geocoding API&#8230; <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> &#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-2/' rel='bookmark' title='Nountype Quirks: Day 2'>Nountype Quirks: Day 2</a></li>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 2</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-2/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-2/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:44:52 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2635</guid>
		<description><![CDATA[Today I&#8217;m continuing the process of reviewing and tweaking all of the nountypes built-in to Ubiquity. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, Judging Noun Types and my status update from yesterday, Nountype Quirks: Day 1. Note: this blog post includes a number of [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>Today I&#8217;m continuing the process of reviewing and tweaking all of the nountypes built-in to <a href="http://ubiquity.mozilla.com">Ubiquity</a>. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a> and my status update from yesterday, <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">Nountype Quirks: Day 1</a>.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-2/">on my site</a>.</em></p>

<p><span id="more-2635"></span></p>

<h3><code>noun_type_twitter_user</code></h3>

<p>Let&#8217;s begin again by considering the suggestions and scores that a variety of different inputs to this nountype return and see what quirks we find.</p>

<p>To test this nountype, I made sure I had logged into <a href="http://twitter.com">Twitter</a> once with the login <a href="http://twitter.com/mitchoyoshitaka/"><code>mitchoyoshitaka</code></a>.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='2'>mitcho</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='2'>mitchoyoshi</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:470px'></span> 0.94</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>test</td><td>test</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>テスト</td><td><i>none</i></td><td></td></tr>
<tr><td>hello world</td><td><i>none</i></td><td></td></tr>
<tr><td>@test</td><td><i>none</i></td><td></td></tr>
</table>

<p><a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">As nountypes go</a>, this is looking pretty good. For usernames which look like logins we&#8217;ve saved before, we&#8217;re using <code>matchScore</code> to get decent differential scores.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> It&#8217;s even ruling out impossible twitter username strings, according to Twitter&#8217;s own restriction:</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/twitter-usernames.png" alt="twitter-usernames.png" border="0" width="574" height="75" /></p>

<p>One possible improvement we could make is to let @ strings be accepted. I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/97871e3a453c">went ahead and made this improvement</a>. The initial @ will be stripped off and then will be checked as normal, but the final score will receive a slight boost using an <a href="http://en.wikipedia.org/wiki/nth_root"><i>n</i>th root</a> formula. The <code>twitter</code> command was also updated to deal with inputs with and without the initial @.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='2'>mitcho</td><td>mitchoyoshitaka</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
<tr><td>mitcho</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td rowspan='2'>@mitcho</td><td>@mitchoyoshitaka</td><td><span class='scorebar' style='width:440px'></span> 0.88</td></tr>
<tr><td>@mitcho</td><td><span class='scorebar' style='width:285px'></span> 0.57</td></tr>

<tr><td>test</td><td>test</td><td><span class='scorebar' style='width:250px'></span> 0.5</td></tr>
<tr><td>@test</td><td>@test</td><td><span class='scorebar' style='width:285px'></span> 0.57</td></tr>
</table>

<p>Although the <code>noun_type_twitter_user</code> nountype is currently most used by the built-in <code>twitter</code> command to specify the user&#8217;s username, in theory it could also be used for example in a command which pulls up another user&#8217;s tweets. With that in mind, perhaps in the future we could check the browser history and/or bookmarks for entries of the form <code>http://twitter.com/...</code> and suggest those as well (<a href="http://ubiquity.mozilla.com/trac/ticket/846">trac #846</a>).</p>

<h3><code>noun_type_number</code></h3>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>text</td><td><i>none</i></td><td></td></tr>
<tr><td>0.5</td><td>0.5</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.5.1</td><td><i>none</i></td><td></td></tr>
</table>

<p>This nountype has an incredibly simple job and does it with ease. I&#8217;m going to leave it alone.</p>

<h3><code>noun_type_date</code> and <code>noun_type_time</code></h3>

<p><code>noun_type_date</code> and <code>noun_type_time</code> both use the magical <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Date/parse">Date.parse</a> method to parse date- and time-like strings. Let&#8217;s first take a look at some of its suggestions:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th><code>date</code> suggestion</th><th><code>time</code> suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="250" height="15" /></th></tr>
<tr><td rowspan='1'>June 8th 5pm</td><td>2009-06-08</td><td>05:00 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5pm</td><td>2009-07-30</td><td>05:00 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5</td><td>2009-07-05</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>June 8th</td><td>2009-06-08</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>today</td><td>2009-07-30</td><td>12:00 AM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>now</td><td>2009-07-30</td><td>02:40 PM</td><td><span class='scorebar' style='width:250px'></span> 1</td></tr>
<tr><td rowspan='1'>5pm is a good time</td><td><i>none</i></td><td><i>none</i></td><td></td></tr>
</table>

<p>The quirks in these outputs can be summed up into these two factors:</p>

<ol>
<li>There is no differential scoring at all.</li>
<li>Both nountypes parse the input with <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Date/parse">Date.parse</a> and then just spit out the date or time components of the result. Thus time-only inputs get the default date and date-only inputs get the default time with equal scores.</li>
</ol>

<p>I just rewrote both nountypes and also added a new <code>noun_type_date_time</code>. Here are some of the features of the new implementation:</p>

<ol>
<li>If the input only contains digits and spaces, it is marked down.</li>
<li>With the exception of the outputs &#8216;today&#8217; and &#8216;now&#8217;, if the resulting <code>Date</code> object&#8217;s date is today, its date suggestion is scored lower; equivalently for time being the default value, &#8220;12:00 AM&#8221;.</li>
<li>Scores (with the exception of &#8216;today&#8217; and &#8216;now&#8217;) which are shorter than the output string get a slight penalty. This factor reflects the intuition that a longer output than input means some generic information was added and thus there is less confidence in the output.</li>
</ol>

<p>Here&#8217;s what some of the inputs give now:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>

<tr><td rowspan='3'>June 8th 5pm</td><td><code>date</code>: 2009-06-08</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>time</code>: 05:00 PM</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;05:00 PM</td><td><span class='scorebar' style='width:430px'></span> 0.86</td></tr>

<tr><td rowspan='3'>5pm</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:135px'></span> 0.27</td></tr>
<tr><td><code>time</code>: 05:00 PM</td><td><span class='scorebar' style='width:405px'></span> 0.81</td></tr>
<tr><td><code>date_time</code>: 2009-07-30&#160;05:00 PM</td><td><span class='scorebar' style='width:245px'></span> 0.49</td></tr>

<tr><td rowspan='3'>5</td><td><code>date</code>: 2009-07-05</td><td><span class='scorebar' style='width:265px'></span> 0.53</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:95px'></span> 0.19</td></tr>
<tr><td><code>date_time</code>: 2009-07-05&#160;12:00 AM</td><td><span class='scorebar' style='width:170px'></span> 0.34</td></tr>

<tr><td rowspan='3'>June 8th</td><td><code>date</code>: 2009-06-08</td><td><span class='scorebar' style='width:475px'></span> 0.95</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:175px'></span> 0.35</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;12:00 AM</td><td><span class='scorebar' style='width:170px'></span> 0.58</td></tr>

<tr><td rowspan='3'>today</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:225px'></span> 0.45</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;12:00 AM</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>

<tr><td rowspan='3'>now</td><td><code>date</code>: 2009-07-30</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td><code>time</code>: 12:00 AM</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td><code>date_time</code>: 2009-06-08&#160;04:34 PM</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>

</table>

<p>In addition, looking to the future we&#8217;d <a href="http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/">like to make nountypes localizable</a> as well, and these two nountypes in particular will surely require some good thinking and planning to make localizable.</p>

<h3><code>noun_type_email</code> and <code>noun_type_contact</code></h3>

<p><code>noun_type_email</code> and <code>noun_type_contact</code> are two closely related nountypes. <code>noun_type_email</code> simply validates email address-looking strings, while <code>noun_type_contact</code> will return the <code>noun_type_email</code> suggestions and additionally return contacts from GMail if available.</p>

<p>The first thing to note is that I&#8217;ve often found the GMail contact lookup to be finicky in my own use. Reading through the code, I discovered the solution: GMail must either be open in a tab or you must use the &#8220;stay signed in&#8221; option and close the GMail tab.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> With this mystery solved, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/8478c7103753">some code cleanup done to this contact fetching</a>, let&#8217;s take a look at some example suggestions: (suggestions overlapping with <code>noun_type_email</code> are not listed here)</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>aza@m</td><td>aza@mozilla.com</td><td><span class='scorebar' style='width:210px'></span> 0.42</td></tr>
<tr><td rowspan='1'>jono</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:140px'></span> 0.28</td></tr>
<tr><td rowspan='1'>jdicarlo</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:95px'></span> 0.19</td></tr>
</table>

<p>In general, we see that these scores all look pretty poor. In particular, though, note that the &#8220;jono&#8221; input yielded a higher score for the same suggestion than &#8220;jdicarlo&#8221;, even though &#8220;jdicarlo&#8221; is longer and thus, intuitively, has more informational content and should maybe do better. Digging into the code I realized why this is. It was computing the scores by comparing &#8220;jono&#8221; and &#8220;jdicarlo&#8221; not simply to &#8220;Jono DiCarlo&#8221; and &#8220;jdicarlo@mozilla.com&#8221; respectively, but to the combined string &#8220;Jono DiCarlo &lt;jdicarlo@mozilla.com&gt;&#8221;. Now with <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/0877848192f2">this change</a> in place, both the email address and name are analyzed individually and, due to the way nountype detection works in Parser 2, no duplicates are returned. Here are the updated results:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>jono</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:415px'></span> 0.83</td></tr>
<tr><td rowspan='1'>jdicarlo</td><td>jdicarlo@mozilla.com</td><td><span class='scorebar' style='width:425px'></span> 0.85</td></tr>
</table>

<p>That&#8217;s much better!</p>

<p>Now let&#8217;s consider the suggestions from <code>noun_type_email</code>. Here are what they originally looked like:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>bpung</td><td><i>none</i></td><td></td></tr>
<tr><td rowspan='1'>bpung@m</td><td>bpung@m</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td rowspan='1'>bpung@mozilla.com</td><td>bpung@mozilla.com</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<p><code>noun_type_email</code> is based on <a href="http://blog.livedoor.jp/dankogai/archives/51190099.html">a very robust regular expression</a> for <a href="http://www.ietf.org/rfc/rfc2822.txt">RFC 2822</a>. Unfortunately this means that it completely rules out strings such as &#8220;bpung&#8221; which could be a proper prefix of an email address—something that I&#8217;ve advocated for avoiding before (see footnote 2 of <a href="http://mitcho.com/blog/projects/judging-noun-types/">Judging Noun Types</a>). Moreover, due to a quirk of how nountypes based on regular expressions are scored, all results are given the score of 1.</p>

<p>I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/0d1803104c7d">just committed a change</a> so that this behavior is improved. The new version accepts strings which match the username part of the email address spec sans @ and domain, but with a great score penalty.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> Moreover, domains which do not have a final label (the <a href="http://en.wikipedia.org/wiki/top level domain">top level domain</a>) with more than one letter (unless it&#8217;s an IP address) or do not have any periods (.) in the domain will be penalized as well. Here&#8217;s what the same inputs produce now:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='1'>bpung</td><td>bpung</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='1'>bpung@m</td><td>bpung@m</td><td><span class='scorebar' style='width:400px'></span> 0.8</td></tr>
<tr><td rowspan='1'>bpung@mozilla.com</td><td>bpung@mozilla.com</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<h3>Same time, same channel</h3>

<p>I hope this post sheds light on the many changes I made together as well as the underlying thought process. If you don&#8217;t agree with any particular fix or analysis, please comment! I&#8217;ll be back again tomorrow with another installment of Nountype Quirks. Stay tuned!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Again, <code>matchScore</code> will be the subject of another blog post in the near future.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>Moreover, due to the way <code>noun_type_contact</code> caches the contact list internally, as long as GMail&#8217;s contacts are available once, you should be able to continue accessing those contacts&#8217; suggestions after logging out of GMail. There are also great performance benefits to this caching. The downside is that we currently have no way to know when to clear the cache, so even if you update your contacts in GMail, those new contacts won&#8217;t appear in Ubiquity until you restart Firefox.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Perhaps this is a horrible idea, because if executed or previewed, any verb which uses these nountypes would have to deal with arguments which are not valid email addresses. In my mind, though, as long as it doesn&#8217;t actually cause any error, this should be okay. Keep in mind that, given the very low scores given to these suggestions, parses using it would most likely only show up if the verb which requires these nountypes was explicitly given and there are other arguments as well, for example in input like &#8220;email hello to bpung&#8221;. In such a situation, we would rather this suggestion not disappear until we type &#8220;@m&#8221;. If executed, the built-in email verb, for instance, will deal with this gracefully by simply putting the incomplete email address in the To field.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/nountype-quirks-day-1/' rel='bookmark' title='Nountype Quirks: Day 1'>Nountype Quirks: Day 1</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nountype Quirks: Day 1</title>
		<link>http://mitcho.com/blog/projects/nountype-quirks-day-1/</link>
		<comments>http://mitcho.com/blog/projects/nountype-quirks-day-1/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 23:00:56 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountypes]]></category>
		<category><![CDATA[scoring]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2598</guid>
		<description><![CDATA[Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I&#8217;ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><style type='text/css'>
.scorebar {
  background-color:red;
  display:inline-block;
  height:0.5em;
  vertical-align:middle;
}
.scoretable td {
  font-size: 0.7em;
}
</style></p>

<p>Today I began the process of going through all of the nountypes built-in to <a href="http://ubiquity.mozilla.com">Ubiquity</a> using <a href="http://mitcho.com/blog/projects/judging-noun-types/">the principles and criteria I laid out yesterday</a>—a task I&#8217;ve had <a href="http://ubiquity.mozilla.com/trac/ticket/746">in planning</a> for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I&#8217;ll document some of the nountype quirks I&#8217;ve discovered so far and what remedy has been implemented or is planned.</p>

<p><em>Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it <a href="http://mitcho.com/blog/projects/nountype-quirks-day-1/">on my site</a>.</em></p>

<p><span id="more-2598"></span></p>

<h3><code>noun_type_percentage</code></h3>

<p>Here&#8217;s what a few different inputs originally returned:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>20</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.2</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>0.2%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20.0</td><td>2000%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>2 hens in the garden</td><td>2%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
</table>

<p>Let me highlight a couple obvious quirks:</p>

<ol>
<li>In certain cases, where the numerical expression includes a decimal and is less than one, it is interpreted as a proportional, rather than percent, value, e.g. &#8220;0.2&#8221; → &#8220;20%&#8221;. &#8220;0.2%&#8221; is not even an option. This is the case even when explicitly adding a % sign.</li>
<li>All suggestions, including those where the numeral was extracted from a long string of text (e.g. &#8220;2 hens in the garden&#8221;), get the same score of 1.</li>
</ol>

<p>I just <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/c3cd4af0f06a">committed a fix</a> so <code>noun_type_percentage</code> now&#8230;</p>

<ol>
<li>Counts the number of characters in the input which match <code>[\d.%]</code> and caps the score by (number of acceptable characters)/(length of input).</li>
<li>Strings which do not include &#8220;%&#8221; get a 10% penalty.</li>
<li>In the case of decimals less than 1 without a % sign, the proportion interpretation is also suggested (e.g. &#8220;0.2&#8221; → &#8220;20%&#8221;) in addition to the original suggestion (&#8220;0.2%&#8221;), but with a slight penalty.</li>
</ol>

<p>Here is what they now return:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>20</td><td>20%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>20%</td><td>20%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td rowspan='2'>0.2</td><td>0.2%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>20%</td><td><span class='scorebar' style='width:405px'></span> 0.81</td></tr>
<tr><td>0.2%</td><td>0.2%</td><td><span class='scorebar' style='width:500px'></span> 1</td></tr>
<tr><td>20.0</td><td>20%</td><td><span class='scorebar' style='width:450px'></span> 0.9</td></tr>
<tr><td>2 hens in the garden</td><td>2%</td><td><span class='scorebar' style='width:25px'></span> 0.05</td></tr>
</table>

<h3><code>noun_type_tag</code></h3>

<p>Here&#8217;s what a few different inputs originally returned. Keep in mind that currently in this test profile, the preexisting tags are &#8220;animal&#8221;, &#8220;help&#8221;, &#8220;test&#8221;, and &#8220;ubiquity&#8221;.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>animal</td><td>animal</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>mineral</td><td>mineral</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>anim</td><td>animal</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>anim</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>help, test, ubiq</td><td>help,test,ubiquity</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>help,test,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>google, yahoo, ubiq</td><td>google,yahoo,ubiquity</td><td><span class='scorebar' style='width:350px'></span> 0.7</td></tr>
<tr><td>google,yahoo,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>google, , yahoo</td><td>google,yahoo</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
</table>

<p>Here are a few of <code>noun_type_tag</code>&#8217;s quirks:</p>

<ol>
<li>There are only two scores ever given out: 0.3 and 0.7.</li>
<li>Only the last tag in the list and whether it exists or not is taken into account.</li>
<li>When the last tag is incomplete, the completion is suggested with a higher score, but if the last tag is <em>exactly</em> equal to an existing tag, it gets the lower score.</li>
</ol>

<p>Ideally, we want <code>noun_type_tag</code> to look at each of the tags given to it, with higher scores for when there are more preexisting tags and fewer new ones. Keep in mind, though, that we only have to suggest the completion of the very last tag as that may be one where the user hasn&#8217;t completed typing yet&#8230; for earlier tags, we can assume (safely or not) that the user placed the comma where they meant to. We can&#8217;t teach Ubiquity to read minds, after all.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>With this in mind, I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/54e6a232ec3a">just made a change</a> to <code>noun_type_tag</code> which aims to follow these principles. The basic idea is that we start with a base score of 0.3 but then raise it via <a href="http://en.wikipedia.org/wiki/nth root"><i>n</i>th root</a> for every tag in the sequence which is preexisting. Here&#8217;s what the same inputs return now. Recall that the preexisting tags are &#8220;animal&#8221;, &#8220;help&#8221;, &#8220;test&#8221;, and &#8220;ubiquity&#8221;.</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td>animal</td><td>animal</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>mineral</td><td>mineral</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>anim</td><td>animal</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>anim</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td rowspan='2'>help, test, ubiq</td><td>help,test,ubiquity</td><td><span class='scorebar' style='width:430px'></span> 0.86</td></tr>
<tr><td>help,test,ubiq</td><td><span class='scorebar' style='width:370px'></span> 0.74</td></tr>
<tr><td rowspan='2'>google, yahoo, ubiq</td><td>google,yahoo,ubiquity</td><td><span class='scorebar' style='width:275px'></span> 0.55</td></tr>
<tr><td>google,yahoo,ubiq</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
<tr><td>google, , yahoo</td><td>google,yahoo</td><td><span class='scorebar' style='width:150px'></span> 0.3</td></tr>
</table>

<h3><code>noun_type_awesomebar</code></h3>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='4'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 400px;" class="scorebar">&nbsp;</span> 0.8</td></tr>
</table>

<p>There are a couple quirks here:</p>

<ol>
<li>All suggestions are returned with the same scores.</li>
<li>The nountype returns the URL of the entry as the HTML-formatted result and the title as the text-formatted result, which clearly does not make sense. However, it&#8217;s not clear to me whether the title, URL, or some combination of both is what we should be returning as the suggestion text presented to the user.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></li>
</ol>

<p>I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/cb98c72364db">just rewrote <code>noun_type_awesomebar</code></a> to actually do some differential scoring. This new version also presents the URL or title depending on whichever had a better match using the <code>matchScore</code> function.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='4'>moz</td><td class="sugg">www.mozilla.com</td><td class="score"><span style="width: 350px;" class="scorebar">&nbsp;</span> 0.7</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 315px;" class="scorebar">&nbsp;</span> 0.63</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 305px;" class="scorebar">&nbsp;</span> 0.61</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>
</table>

<h3><code>noun_type_url</code></h3>

<p>The purpose of <code>noun_type_url</code>&#8217;s suggest function is two-fold: first, to accept strings which may look like a URL and, second, to suggest URL&#8217;s from the history just like <code>noun_type_url</code>, but only based on URL matches and not title matches.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> Here are a few sample inputs:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='5'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://moz</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>

<tr><td rowspan='1'>test</td><td class="sugg">http://test</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http://</td><td class="sugg">http://</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http:</td><td class="sugg">http:</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>http</td><td class="sugg">http</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>_test</td><td class="sugg">http://_test</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
<tr><td rowspan='1'>hello world!</td><td class="sugg">http://hello world!</td><td class="score"><span style="width: 250px;" class="scorebar">&nbsp;</span> 0.5</td></tr>
</table>

<p>Oh, where to begin!? Here are some initial quirks&#8230; it&#8217;s possible that you could think of more!</p>

<ol>
<li>There is no differential scoring&#8230; only 0.9 for suggestions from history and 0.5 for URL-like strings.</li>
<li>A number of invalid domain names are being accepted and turned into suggestions (&#8220;hello world!&#8221;, &#8220;_test&#8221;, etc.).</li>
<li>It&#8217;s trying to be smart by suggesting &#8220;http://&#8221; as a default <a href="http://en.wikipedia.org/wiki/URI scheme">URI scheme</a> but doing so even for prefixes (initial substrings) of the word &#8220;http&#8221; itself.</li>
</ol>

<p>With these thoughts in mind, I <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/rev/26f179661107">just took a first stab</a> at improving this situation. Here are some features of the new implementation:</p>

<ol>
<li>History entries are scored in the same way as in <code>noun_type_awesomebar</code>, using <code>matchScore</code>.</li>
<li>URLs without an explicit <a href="http://en.wikipedia.org/wiki/URI scheme">URI scheme</a> (like &#8220;http://&#8221;) get a 10% penalty.</li>
<li>&#8220;http://&#8221; is only suggested if one of a long list of common URI schemes are not detected.</li>
<li>It repairs schemes which are missing a slash or two, suggesting for example &#8220;http:hello.com&#8221; → &#8220;http://hello.com&#8221;.</li>
<li>It actually uses Firefox&#8217;s own <a href="https://developer.mozilla.org/en/nsIIDNService">IDNService</a> to check if the domain name is a valid <a href="http://en.wikipedia.org/wiki/internationalized domain name">internationalized domain name</a>. If it&#8217;s an IDN as opposed to LDH (&#8220;letters, digits, and hyphens&#8221;), it gets a 10% penalty. If it&#8217;s not even a valid IDN, it is ruled out (see last two example inputs below).</li>
<li>There are also penalties for only being a domain name with no path and for the domain not having any periods (.) in it.</li>
</ol>

<p>Here is what our suggestions now look like:</p>

<table style='border:0' class='scoretable'>
<tr><th>input</th><th>suggestion</th><th><img src="http://mitcho.com/blog/wp-content/uploads/2009/07/tuner-top.png" alt="tuner-top.png" border="0" width="500" height="30" /></th></tr>
<tr><td rowspan='5'>moz</td><td class="sugg">http://www.mozilla.com/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>
<tr><td class="sugg">http://moz</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>
<tr><td class="sugg">https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial</td><td class="score"><span style="width: 315px;" class="scorebar">&nbsp;</span> 0.63</td></tr>
<tr><td class="sugg">http://en-us.start3.mozilla.com/ firefox?client=firefox-a&#038;rls= org.mozilla:en-US:official</td><td class="score"><span style="width: 305px;" class="scorebar">&nbsp;</span> 0.61</td></tr>
<tr><td class="sugg">http://en-us.www.mozilla.com/en-US/firefox/about/</td><td class="score"><span style="width: 300px;" class="scorebar">&nbsp;</span> 0.6</td></tr>

<tr><td rowspan='1'>test</td><td class="sugg">http://test</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>
<tr><td rowspan='2'>http://</td><td class="sugg">http://</td><td class="score"><span style="width: 500px;" class="scorebar">&nbsp;</span> 1</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 375px;" class="scorebar">&nbsp;</span> 0.75</td></tr>

<tr><td rowspan='2'>http:</td><td class="sugg">http://</td><td class="score"><span style="width: 450px;" class="scorebar">&nbsp;</span> 0.9</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 350px;" class="scorebar">&nbsp;</span> 0.7</td></tr>

<tr><td rowspan='4'>http</td><td class="sugg">http://</td><td class="score"><span style="width: 360px;" class="scorebar">&nbsp;</span> 0.72</td></tr>
<tr><td class="sugg">https://</td><td class="score"><span style="width: 355px;" class="scorebar">&nbsp;</span> 0.71</td></tr>
<tr><td class="sugg">shttp://</td><td class="score"><span style="width: 340px;" class="scorebar">&nbsp;</span> 0.68</td></tr>
<tr><td class="sugg">http://http</td><td class="score"><span style="width: 325px;" class="scorebar">&nbsp;</span> 0.65</td></tr>

<tr><td rowspan='1'>_test</td><td class="sugg"><i>none</i></td><td class="score">&nbsp;</td></tr>
<tr><td rowspan='1'>hello world!</td><td class="sugg"><i>none</i></td><td class="score">&nbsp;</td></tr></table>

<h3>See you tomorrow~</h3>

<p>Alright, enough nountype wrangling for one day. I&#8217;ll be back again tomorrow for another installment.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>If we could make assumptions about what tags look like, for example that they are always pretty short, or use certain character classes, we could use such factors as well to judge non-preexisting tags for &#8220;tagginess&#8221; but unfortunately it&#8217;s possible (though unlikely) that a user would prefer really long tag strings and of course Firefox allows tags in any unicode code range. The only strings we can immediately rule out as impossible are ones which are purely whitespace.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>It&#8217;s actually unclear whether the method we&#8217;re using (<a href="https://developer.mozilla.org/en/nsIAutoCompleteSearch"><code>nsIAutoCompleteSearch</code></a>) is actually searching titles or not&#8230; it currently looks like it&#8217;s only looking at the URL&#8217;s. Perhaps the title query is what we&#8217;re supposed to enter in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=507315">the mystery parameter</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>I hope to discuss the <code>matchScore</code> function in a separate blog post later.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>While writing up this section I ran into a bug whereby when both <code>noun_type_awesomebar</code> and <code>noun_type_url</code> are active, only one of their async callbacks from <code>Utils.history.search</code> are returned. Thus, if lucky, only one of the nountypes will return the history results and if unlucky the parse query will not complete. Filed as <a href="http://ubiquity.mozilla.com/trac/ticket/845">trac #845</a>.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/judging-noun-types/' rel='bookmark' title='Judging Noun Types'>Judging Noun Types</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/nountype-quirks-day-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Scoring for Optimization</title>
		<link>http://mitcho.com/blog/observation/scoring-for-optimization/</link>
		<comments>http://mitcho.com/blog/observation/scoring-for-optimization/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 09:51:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[harmonic analysis]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1850</guid>
		<description><![CDATA[Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate&#8217;s score. You&#8217;re only interested in the top candidates. You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate&#8217;s score. You&#8217;re only interested in the top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates. <strong>You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier without sacrificing quality.</strong> Such is <a href="http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/">the problem of scoring and ranking suggestions in Ubiquity</a>. What properties must such a scoring system have?</p>

<p><em>This blog post includes a lot of complex CSS-formatted graphs which may be best viewed in — what else? — <a href="http://mozilla.com">Firefox</a>. You may also want to <a href="http://mitcho.com/blog/observation/scoring-for-optimization/">access this blog post directly</a> rather than through a planet.</em></p>

<p><style type='text/css'>
.mitchostable, .mitchostable tr, .mitchostable td, .mitchostable th {
  border:0;
  margin:0;
  padding:1px;
  background-color: transparent;
  text-align:left;
}
tr.cutoff th, tr.cutoff td { border-bottom: 1px #666 solid }
tr.cutoff td.cutoff {
  font-style: italic;
  font-size: 0.8em;
  color: #666;
  border: 0;
}
.mitchostable img { height: 7px }
.mitchostable span.bar { 
  background-color: #ccc;
  display: inline-block;
  height: 7px;
}
.mitchostable span.arrow-right { 
  background: #ccc url(http://mitcho.com/i/cccarrow-right.png) no-repeat scroll center right;
  display: inline-block;
  height: 7px;
}
.mitchostable span.arrow-left { 
  background: #ccc url(http://mitcho.com/i/cccarrow-left.png) no-repeat scroll center left;
  display: inline-block;
  height: 7px;
}
.mitchostable span.bound-right { 
  background: transparent url(http://mitcho.com/i/bound-right.png) no-repeat scroll center right;
  display: inline-block;
  height: 7px;
}
.mitchostable.threshold {
  background: transparent url(http://mitcho.com/i/000.png) repeat-y scroll 180px 0px;
}
.mitchostable.threshold2 {
  background: transparent url(http://mitcho.com/i/000.png) repeat-y scroll 70px 0px;
}
.mitchostable.threshold *, .mitchostable.threshold2 * {
  background: transparent;
}</p>

<p></style></p>

<table border='0' class='mitchostable'>

<tr><th>candidate 8</th><td><span class='bar' style='width:180px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 2</th><td><span class='bar' style='width:166px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 9</th><td><span class='bar' style='width:123px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 3</th><td><span class='bar' style='width:107px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr class='cutoff'><th>candidate 10</th><td><span class='bar' style='width:96px'>&nbsp;</span></td><td rowspan='2' class='cutoff'>CUTOFF</td></tr>

<tr><th>candidate 5</th><td><span class='bar' style='width:70px'>&nbsp;</span></td></tr>
<tr><th>candidate 1</th><td><span class='bar' style='width:50px'>&nbsp;</span></td></tr>
<tr><th>candidate 7</th><td><span class='bar' style='width:43px'>&nbsp;</span></td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>

<p>One portion of the problem description above merits clarification: I define &#8220;without sacrificing quality&#8221; to mean that, if we did not throw out any candidates early and waited until all the scores are computed fully and accurately, we would still yield the same top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> winners. This already gives us the key insight towards an appropriate solution: <em>we can only throw out candidates when we know that it has no further chance of making it up into top <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates.</em></p>

<p><span id="more-1850"></span></p>

<h3>Let&#8217;s get formal</h3>

<p>Let&#8217;s call <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> the score of candidate <img src='http://s0.wp.com/latex.php?latex=C_%7Bi%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='C_{i}' title='C_{i}' class='latex' /> at time <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' /> in the derivation and we&#8217;ll assume that the score derivations are done in parallel with a unique origin (<img src='http://s0.wp.com/latex.php?latex=t%3D0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=0' title='t=0' class='latex' />).<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> We&#8217;ll use the notation <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(&#92;infty)' title='S_{i}(&#92;infty)' class='latex' /> to represent the equilibrium or final score, equal to <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> for all <img src='http://s0.wp.com/latex.php?latex=t+%3E+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t &gt; ' title='t &gt; ' class='latex' /> a certain <img src='http://s0.wp.com/latex.php?latex=t%5E%7B%5Cprime%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t^{&#92;prime}' title='t^{&#92;prime}' class='latex' /> which exists for each candidate. This function <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}' title='S_{i}' class='latex' /> thus defines a <a href="http://en.wikipedia.org/wiki/time series">time series</a> for each candidate.</p>

<p>Given a set of candidates <img src='http://s0.wp.com/latex.php?latex=%5Cleft%5C%7BC_1%2CC_2%2C%5Cldots%2CC_k%5Cright%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;left&#92;{C_1,C_2,&#92;ldots,C_k&#92;right&#92;}' title='&#92;left&#92;{C_1,C_2,&#92;ldots,C_k&#92;right&#92;}' class='latex' />, we want to find the best subset of <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates; that is, <img src='http://s0.wp.com/latex.php?latex=%5Cleft%5C%7BC_%7Bi_1%7D%2CC_%7Bi_2%7D%2C%5Cldots%2CC_%7Bi_n%7D%5Cright%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;left&#92;{C_{i_1},C_{i_2},&#92;ldots,C_{i_n}&#92;right&#92;}' title='&#92;left&#92;{C_{i_1},C_{i_2},&#92;ldots,C_{i_n}&#92;right&#92;}' class='latex' /> such that</p>

<p><center><img src='http://s.wordpress.com/latex.php?latex=%5Cdisplaystyle%20%5Cforall_%7B%20i%5Cin%20%5C%7Bi_1%2C%5Cdots%2Ci_n%5C%7D%2C%20j%5Cin%20%5C%7B1%2C%5Cdots%2Ck%5C%7D%5Csetminus%5C%7Bi_1%2C%5Cldots%2Ci_n%5C%7D%7D%20S_%7Bi%7D%28%5Cinfty%29%20%5Cgeq%20S_%7Bj%7D%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='\forall_{ i\in \{i_1,\dots,i_n\}, j\in \{1,\dots,k\}\setminus\{i_1,\ldots,i_n\}} S_{i}(\infty) \geq S_{j}(\infty)'/>.</center></p>

<h3>Approach 1: A Threshold Model</h3>

<p>The key insight above would naturally give us what I call the threshold model. Here, we require the score sequences to be non-increasing: <img src='http://s0.wp.com/latex.php?latex=%5Cforall_%7Bt+%3C+t%5E%7B%5Cprime%7D%7D+S_%7Bi%7D%28t%29+%3C+S_%7Bi%7D%28t%5E%7B%5Cprime%7D%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;forall_{t &lt; t^{&#92;prime}} S_{i}(t) &lt; S_{i}(t^{&#92;prime})' title='&#92;forall_{t &lt; t^{&#92;prime}} S_{i}(t) &lt; S_{i}(t^{&#92;prime})' class='latex' />. This way, we can naturally throw out candidates which have reached below a certain threshold <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' /> (or attained a certain level of badness, you might say) which we can then be sure will never recover.</p>

<p>For example, suppose the following diagram represents the scores of five different candidates after the first four time steps of the derivation. (The full gray bar marks the initial score (<img src='http://s0.wp.com/latex.php?latex=S_i%280%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_i(0)' title='S_i(0)' class='latex' />) and the arrows indicate the successive score differentials.) The vertical line marks the threshold, <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' />.</p>

<table border='0' class='mitchostable threshold'>
<tr><th>candidate 1</th><td><span class='bar' style='width:130px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 2</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 3</th><td><span class='bar' style='width:110px'>&nbsp;</span><span class='arrow-left' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 4</th><td><span class='bar' style='width:53px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>candidate 5</th><td><span class='bar' style='width:114px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table>

<p>We can tell after four steps that candidates 2 and 4, given that the score sequences are non-increasing, have no chance to finish their derivation with a score <img src='http://s0.wp.com/latex.php?latex=%3E+M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&gt; M' title='&gt; M' class='latex' />. What is important to note, however, is that <em>candidate 4 already had no chance of beating the threshold after three steps.</em> <strong>There was no need to calculate the fourth derivation of the score of candidate 4</strong> (<img src='http://s0.wp.com/latex.php?latex=S_%7B4%7D%284%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{4}(4)' title='S_{4}(4)' class='latex' />). In other words, after three steps, we could completely take candidate 4 out of the running and after another step, take candidate 2 out of the running.</p>

<table>
<tr><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D3&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=3' title='t=3' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D4&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=4' title='t=4' class='latex' /></td></tr>
<tr>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:113px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:117px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:73px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:70px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:100px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:90px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C4</strike></th><td><span class='bar' style='width:23px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:67px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable threshold2'>
<tr><th>C1</th><td><span class='bar' style='width:80px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span><span class='arrow-left' style='width:13px'>&nbsp;</span><span class='arrow-left' style='width:8px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td><span class='bar' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:50px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:60px'>&nbsp;</span><span class='arrow-left' style='width:30px'>&nbsp;</span><span class='arrow-left' style='width:27px'>&nbsp;</span><span class='arrow-left' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C4</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>C5</th><td><span class='bar' style='width:64px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:3px'>&nbsp;</span><span class='arrow-left' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
</tr>
</table>

<p>This non-decreasing score approach was used in Ubiquity Parser 2 until just recently, and you can in fact still play with it on the <a href="http://mitcho.com/code/ubiquity/parser-demo/">online Ubiquity Parser TNG demo</a>. In that version, every parse started with an initial score of 1 and every score factor would be a value between 0 and 1. Every score factor was multiplied onto the previous score throughout the derivation, making it trivially non-increasing.</p>

<p><strong>The problem with this approach</strong> is how to choose a smart threshold and that, given a constant threshold, you may get a different number of results for every different candidate set (i.e. parser query). If your score indicates a meaningful value with an a priori specified target of acceptable values, having a threshold makes sense. In the case of Ubiquity, however, the interface expects a certain number of suggestions to be returned.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> If we plan to display five suggestions but the parser only returns four, even though there were other candidates, there must be a very good reason and justification for that threshold value.</p>

<h3>Approach 2: Raising the Bar</h3>

<p>The problem with Approach 1 was that there was no way of guaranteeing that we would yield our predefined <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> winning candidates. Even if at some point in the derivation we are left with <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates still above the threshold, as the only restriction we have is that our score series are non-increasing, there is still a possibility that those remaining <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates&#8217; scores will drop below <img src='http://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' /> later in the derivation.</p>

<p>We must instead at some point in the derivation identify <strong>(a)</strong> a set of at least <img src='http://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> candidates which will not get &#8220;worse&#8221; in the derivation and <strong>(b)</strong> candidates which have no chance of overtaking the (a) candidates. In this situation we can safely throw out the (b) candidates.</p>

<p>One way to do this is to require that all the scores <img src='http://s0.wp.com/latex.php?latex=S_%7Bi%7D%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_{i}(t)' title='S_{i}(t)' class='latex' /> are <strong>bounded and non-decreasing</strong>. By virtue of being non-decreasing, our top candidates at any point in our derivation will never get &#8220;worse&#8221; afterwards, satisfying condition (a). If relatively early in the computation we can compute a bound <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' />, we can identify candidates which will never surpass the top candidates in group (a) above, satisfying condition (b).</p>

<p>In the example below, <img src='http://s0.wp.com/latex.php?latex=n%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n=2' title='n=2' class='latex' /> and the thin bars mark the upper bounds <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' />. At <img src='http://s0.wp.com/latex.php?latex=t%3D1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=1' title='t=1' class='latex' /> we can identify candidate 2 and 4 as being our top two candidates. Note that there is one candidate, candidate 5, whose upper bound <img src='http://s0.wp.com/latex.php?latex=B_5&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_5' title='B_5' class='latex' /> is less than both <img src='http://s0.wp.com/latex.php?latex=S_2%281%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_2(1)' title='S_2(1)' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_4%281%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_4(1)' title='S_4(1)' class='latex' />. By definition <img src='http://s0.wp.com/latex.php?latex=S_5%28%5Cinfty%29+%5Cleq+B_5&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_5(&#92;infty) &#92;leq B_5' title='S_5(&#92;infty) &#92;leq B_5' class='latex' /> and because the scores are non-decreasing <img src='http://s0.wp.com/latex.php?latex=S_2%281%29+%5Cleq+S_2%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_2(1) &#92;leq S_2(&#92;infty)' title='S_2(1) &#92;leq S_2(&#92;infty)' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_4%281%29+%5Cleq+S_4%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_4(1) &#92;leq S_4(&#92;infty)' title='S_4(1) &#92;leq S_4(&#92;infty)' class='latex' />. Therefore</p>

<p><center><img src='http://s.wordpress.com/latex.php?latex=S_5%28%5Cinfty%29%20%3C%20S_2%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='S_5(\infty) < S_2(\infty)'/> and <img src='http://s.wordpress.com/latex.php?latex=S_5%28%5Cinfty%29%20%3C%20S_4%28%5Cinfty%29&#038;bg=ffffff&#038;fg=000000&#038;s=1' alt='S_5(\infty) < S_4(\infty)'/></center></p>

<p>and we can thus throw out candidate 5 at this point. By the same logic, after <img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /> we can throw candidate 2 out of the running.</p>

<table>
<tr><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=1' title='t=1' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' /></td><td colspan='2'><img src='http://s0.wp.com/latex.php?latex=t%3D3&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=3' title='t=3' class='latex' /></td></tr>
<tr>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='bound-right' style='width:70px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C2</th><td><span class='bar' style='width:59px'>&nbsp;</span><span class='bound-right' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='bound-right' style='width:40px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='bound-right' style='width:15px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td><span class='bar' style='width:56px'>&nbsp;</span><span class='bound-right' style='width:6px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='arrow-right' style='width:56px'>&nbsp;</span><span class='bound-right' style='width:14px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td><span class='bar' style='width:59px'>&nbsp;</span><span class='arrow-right' style='width:5px'>&nbsp;</span><span class='bound-right' style='width:10px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='arrow-right' style='width:20px'>&nbsp;</span><span class='bound-right' style='width:20px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='bound-right' style='width:9px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
<td><table border='0' class='mitchostable'>
<tr><th>C1</th><td><span class='bar' style='width:28px'>&nbsp;</span><span class='arrow-right' style='width:56px'>&nbsp;</span><span class='arrow-right' style='width:4px'>&nbsp;</span><span class='bound-right' style='width:10px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C2</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>C3</th><td><span class='bar' style='width:49px'>&nbsp;</span><span class='arrow-right' style='width:20px'>&nbsp;</span><span class='arrow-right' style='width:15px'>&nbsp;</span><span class='bound-right' style='width:5px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th>C4</th><td><span class='bar' style='width:83px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='arrow-right' style='width:6px'>&nbsp;</span><span class='bound-right' style='width:3px'>&nbsp;</span></td><td>&nbsp;</td></tr>
<tr><th><strike>C5</strike></th><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><th>&#8230;</th><td>&nbsp;</td><td>&nbsp;</td></tr>
</table></td><td>→</td>
</tr>
</table>

<p>Calling this the &#8220;raising the bar&#8221; method refers to the fact that, at any particular time <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' />, the &#8220;bar&#8221; is <img src='http://s0.wp.com/latex.php?latex=min%5Cleft%28%5Cleft%5C%7B%5Cmbox%7Bthe+%7Dn%5Cmbox%7B+greatest+%7DS_%7Bi%7D%28t%29%5Cmbox%7B+values%7D%5Cright%5C%7D%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='min&#92;left(&#92;left&#92;{&#92;mbox{the }n&#92;mbox{ greatest }S_{i}(t)&#92;mbox{ values}&#92;right&#92;}&#92;right)' title='min&#92;left(&#92;left&#92;{&#92;mbox{the }n&#92;mbox{ greatest }S_{i}(t)&#92;mbox{ values}&#92;right&#92;}&#92;right)' class='latex' /> and every other candidate must have an upper bound <img src='http://s0.wp.com/latex.php?latex=B_j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_j' title='B_j' class='latex' /> greater than the bar in order to not be thrown out of consideration. This &#8220;bar&#8221; itself is, together with the component scores, non-decreasing, decreasing the number of surviving candidates over time.</p>

<p>In the case of <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">the Ubiquity parser</a> we could build such a non-decreasing and bounded scoring model by using an additive model. As the main component of parser scoring is <a href="https://ubiquity.mozilla.com/trac/ticket/435">how well the parsed arguments match the verbs&#8217; specified nountypes</a>, we could simply add up all the confidence scores of each nountype suggestion, each of which are a value between 0 and 1. This would trivially be non-decreasing. As each parse has a finite and known number of parsed arguments, we could easily determine a bound as well. For example, say a parse <img src='http://s0.wp.com/latex.php?latex=S_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_0' title='S_0' class='latex' /> has two arguments. Before we check each of the nountypes&#8217; match scores, we already know that <img src='http://s0.wp.com/latex.php?latex=S_0%28%5Cinfty%29+%5Cleq+2+%3D+B_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_0(&#92;infty) &#92;leq 2 = B_0' title='S_0(&#92;infty) &#92;leq 2 = B_0' class='latex' />.</p>

<p>Unfortunately, there are also other factors which we would like to consider in our parses which may not fit into this non-decreasing model so easily&#8230;</p>

<h3>Approach 2&#8217;: The Rising Sun Model<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></h3>

<p>One problem with both of the previous approaches is that it requires that the scoring schemes be either non-increasing or non-decreasing across the derivation. There are many situations, however, where you would want different factors to affect the score both positively and negatively. In the case of the Ubiquity parser, here are some different factors which could be good positive and negative score factors in computing the score of each parse.</p>

<table>
<tr><th>positive factors</th><th>negative factors</th></tr>
<tr><td>the verb&#8217;s specified nountype matching the argument noun well</td><td>having to suggest the verb</td></tr>
<tr><td>the verb in the input matching the verb well</td><td>multiple arguments parsed for a single <a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/'>semantic role</a></td></tr>
<tr><td>the verb being used often</td><td>the verb missing some arguments</td></tr>
</table>

<p>As we see, there are both positive and negative factors which we hope to consider in scoring our possible Ubiquity parses. They key to making this work is by noting that Approach 2 only requires that the scoring series be bounded and non-decreasing <em>after a certain known time in the derivation</em>. For example, even if a parse involves a number of decreases early in the parse derivation, if after a certain point we can be certain that it is non-decreasing and bounded, we can simply use that bound and start eliminating poor candidates at that time (in this example, after <img src='http://s0.wp.com/latex.php?latex=t%3D2&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t=2' title='t=2' class='latex' />).</p>

<p><style type='text/css'>
.mitchostable2, .mitchostable2 tr, .mitchostable2 td, .mitchostable2 th {
  border:0;
  margin:0;
  padding:1px;
  text-align:left;
  vertical-align: bottom;
}
.mitchostable2 {
  background: transparent url(http://mitcho.com/i/000.png) repeat-x 0px 57px;
}
.mitchostable2 * {
  background: transparent;
}
.mitchostable2 span.bar { 
  background-color: #ccc;
  display: inline-block;
  width: 7px;
}
</style></p>

<table border='0' class='mitchostable2'>
<tr>
<td><span class='bar' style='height:150px'>&nbsp;</span></td><td><span class='bar' style='height:120px'>&nbsp;</span></td><td><span class='bar' style='height:90px'>&nbsp;</span></td><td><span class='bar' style='height:50px'>&nbsp;</span></td><td><span class='bar' style='height:60px'>&nbsp;</span></td><td><span class='bar' style='height:72px'>&nbsp;</span></td><td><span class='bar' style='height:80px'>&nbsp;</span></td><td><span class='bar' style='height:82px'>&nbsp;</span></td><td><span class='bar' style='height:90px'>&nbsp;</span></td><td><span class='bar' style='height:92px'>&nbsp;</span></td><td><span class='bar' style='height:92px'>&nbsp;</span></td><td><span class='bar' style='height:93px'>&nbsp;</span></td><td><span class='bar' style='height:93px'>&nbsp;</span></td><td><span class='bar' style='height:94px'>&nbsp;</span></td>
</tr>
<tr><td>0</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>5</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td colspan="3">10</td><td>&nbsp;</td></tr>
</table>

<p>This is very much possible in the Ubiquity parser as, given the <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Ubiquity Parser 2 design</a>, the negative factors such as whether the parse has a verb from the input or not (step 2), whether multiple arguments are identified with the same semantic role (step 4), and how many of the verb&#8217;s arguments are in the input (step 4) can be identified early on in the derivation, all before the very computationally intensive step of nountype detection (step 7) and argument suggestion (step 8). In this way, we can front-load all the negative factors in scoring and continue to use a version of Approach 2 to optimize our parsing.</p>

<p>We can moreover make the effect of the negative factors be felt across the entire derivation by figuring the negative factors into a factor between 0 and 1 and multiplying it onto each of the positive factors being added. In other words, we can compute all the negative factors into a single <strong>score multiplier</strong> <img src='http://s0.wp.com/latex.php?latex=%5Cmu_i+%5Cin+%5B0%2C1%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_i &#92;in [0,1]' title='&#92;mu_i &#92;in [0,1]' class='latex' /> earlier in the derivation and then afterwards when adding up each of the positive factors simply applying that score multiplier to the score derivation:</p>

<p><center><img src='http://s0.wp.com/latex.php?latex=%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+0%7D%29+%2B+%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+1%7D%29+%2B+%5Cldots+%5Cmu_%7Bi%7D%28%5Cmbox%7Bpositive+factor+%7Dm%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_{i}(&#92;mbox{positive factor 0}) + &#92;mu_{i}(&#92;mbox{positive factor 1}) + &#92;ldots &#92;mu_{i}(&#92;mbox{positive factor }m)' title='&#92;mu_{i}(&#92;mbox{positive factor 0}) + &#92;mu_{i}(&#92;mbox{positive factor 1}) + &#92;ldots &#92;mu_{i}(&#92;mbox{positive factor }m)' class='latex' />.</center></p>

<p>This model is what is going on <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/2bc28033a723/ubiquity/index.html#modules/parser/tng/parser.js">under the hood</a> in <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">Ubiquity Parser 2</a>. The <code>Parser.Parse</code> class has a property called <code>.scoreMultiplier</code> which contains the score multiplier <img src='http://s0.wp.com/latex.php?latex=%5Cmu_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mu_i' title='&#92;mu_i' class='latex' /> as described above. A method called <code>.getMaxScore()</code> is implemented in addition to <code>.getScore()</code> so that, even before all of the nountype suggestion scores have been computed (e.g., in the case of asynchronous suggestions) <code>.getMaxScore()</code> can be used as an upper bound <img src='http://s0.wp.com/latex.php?latex=B_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B_i' title='B_i' class='latex' /> and compared to the in-progress scores of other candidates and lower candidates can thus be taken out of consideration earlier in the parse process.</p>

<h3>Conclusion</h3>

<p>In this blog post I&#8217;ve laid out a few different iterations of approaches I&#8217;ve thought of on the problem of scoring and ranking Ubiquity suggestions in a smart way. While some of the basic mechanisms of front-loading the negative factors into a <code>scoreMultiplier</code> and the computation of the <code>maxScore</code> (or upper bound) have been implemented, the actual optimization algorithm described here of removing parses from consideration earlier in the parser query has yet to be implemented in Ubiquity Parser 2 and I look forward to seeing it in action. In addition, there are surely factors I haven&#8217;t considered in the scoring or further tricks to improve the optimized scoring algorithm. <strong>I&#8217;d love to get your feedback and ideas on this topic.</strong> Thanks!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>In the case of Ubiquity Parser 2, we&#8217;ll let the &#8220;time&#8221; values <img src='http://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' /> refer to the &#8220;steps&#8221; in the derivation, as laid out in <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">the Ubiquity Parser 2 design</a>. Note that these &#8220;steps&#8221; are currently done in parallel across all candidates in the current architecture, making the &#8220;time&#8221; analogy legitimate. I will thus use integer time values here, making this a <a href="http://en.wikipedia.org/wiki/discrete-time">discrete-time</a> model.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>Every Ubiquity parser query takes as a parameter the maximum number of suggestions to be returned. See <a href="https://ubiquity.mozilla.com/trac/ticket/532">the latest parser query interface proposal</a> for details on this interface.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>This naming is an homage to the <a href="http://en.wikipedia.org/wiki/rising sun lemma">rising sun lemma</a> of <a href="http://en.wikipedia.org/wiki/Frigyes Riesz">Frigyes Riesz</a> which uses a similar logic. The apparent connection to the fact that I am Japanese is purely coincidental.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-for-optimization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Scoring and Ranking Suggestions</title>
		<link>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/</link>
		<comments>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 07:17:26 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[constraints]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[Optimality Theory]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1745</guid>
		<description><![CDATA[I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I&#8217;d put some of these thoughts down in writing. The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> so I thought I&#8217;d put some of these thoughts down in writing.</p>

<p>The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four &#8220;scores&#8221; for each suggestion:</p>

<p><span id="more-1745"></span></p>

<ol>
<li><code>duplicateDefaultMatchScore</code>: 100 by default—lowered if an unused argument gets multiple suggestions (in <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/0aaeae361c33/ubiquity/modules/parser/parser.js#l558">the words of the code</a>: &#8220;reduce the match score so that multiple entries with the same verb are only shown if there are no other verbs.&#8221;)</li>
<li><code>frequencyMatchScore</code>: a score from the <code>suggestion memory</code> of the frequency of the suggestion&#8217;s verb, given the input verb (currently the first word) or nothing, in the case of noun-first suggestions</li>
<li><code>verbMatchScore</code>: float in [0,1]: (as described <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_Documentation#Scoring_the_Quality_of_the_Verb_Match">here</a>)

<ul>
<li>0.75 is returned in case there it is a noun-first suggestion (by virtue of the fact that <code>String.indexOf('')==0</code>)</li>
<li>1 if the verb name is equivalent across input-output</li>
<li>in [0.75,1) if the input is a prefix of the suggestion verb name</li>
<li>in [0.5,0.75) if the input is a non-prefix substring of the suggestion verb</li>
<li>in [0.25,0.5] if the input is a prefix of one of the <code>synonyms</code></li>
<li>in [0,0.25) if the input is a non-prefix substring of one of the <code>synonyms</code></li>
</ul></li>
<li><code>argMatchScore</code>: the number of arguments with matching &#8220;specific&#8221; nountypes, where &#8220;specific&#8221; is designated by the nountype having property <code>rankLast=false</code>.</li>
</ol>

<p>With the numeric scores for each of these criteria, a partial order of suggestions is constructed using a <a href="http://en.wikipedia.org/wiki/lexicographic order">lexicographic order</a>: that is, compare candidates first using <code>duplicateDefaultMatchScore</code>, break ties using <code>frequencyMatchScore</code>, if still tied break using <code>verbMatchScore</code>, and if still tied break using <code>argMatchScore</code>. This paradigm of constraints is called &#8220;strictly ranked&#8221; and a corollary of this is that lower constraints, no matter how well you score on them, can never overcome a loss at a higher constraint. A crucial corollary of this system is that lower constraints&#8217; scores need not be computed if a higher constraint already dooms it to a lower position.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<h3>Ranking in The Next Generation</h3>

<p>One of the goals of <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> is to make noun/argument-first input first-class citizens of Ubiquity, improving their suggestions in particular to the benefit of <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a>. Arguments will be split up and tested against different noun types before a verb is even entered into the input, in which case target verbs can be ranked according to the appropriateness of the input&#8217;s arguments. As such, I believe the <code>argMatchScore</code> criteria above should either be ranked higher in a strictly ranked model or be allowed to overtake lower scores for the higher constraints in a non-strictly ranked model.</p>

<p>The <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> proposal and <a href="http://mitcho.com/code/ubiquity/parser-demo">demo</a> currently orders using a product of various criteria&#8217;s scores, rather than a lexicographic order of strictly ranked constraints. The component factors are:</p>

<ol>
<li><code>0.5</code> for parses where the verb was suggested</li>
<li><code>0.5</code> for each extra (>1) <code>object</code> argument (essentially &#8220;unused words&#8221; in the previous parser)</li>
<li>the score of each argument against that semantic role&#8217;s target noun type</li>
<li><code>0.8</code> for each unset argument of that verb</li>
</ol>

<p>Each component score is a value in [0,1], so the score is always non-decreasing across the derivation. This offers a natural way to optimize the candidate set creation: if a possible parse ever gets a score below a magic &#8220;threshold&#8221; value, it is immediately thrown away.</p>

<p>A possible problem with the current Parser TNG scoring model is that it will implicitly hinder verbs and parses with more arguments as it could have more sub-1 noun type score factors—this consideration may be great enough that a weighted additive model should be considered over a multiplicative one.</p>

<p><strong>How do you think we can make Ubiquity&#8217;s suggestion ranking smarter? What other factors should be considered, and what factors could be left out?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>For all the linguists in the audience, if this sounds like <a href="http://en.wikipedia.org/wiki/Optimality Theory">Optimality Theory</a>, you would be right—there&#8217;s a little bit of <a href="http://roa.rutgers.edu/view.php3?roa=537">Prince and Smolensky (1993)</a> hanging out <a href="http://ubiquity.mozilla.com">in your browser</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Ubiquity Parser: The Next Generation Demo</title>
		<link>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 03:13:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[California]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[overlord verbs]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[photo]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb-final]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1590</guid>
		<description><![CDATA[A week or two ago while visiting California, Jono and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in Ubiquity Parser: The Next Generation. The new architecture is designed to support (1) the use of overlord verbs, (2) writing verbs by semantic roles, and (3) [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><a href='http://mitcho.com/blog/wp-content/uploads/2009/03/parserdesign.jpg' rel='lightbox[parser]'><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/parserdesign.jpg" alt="parserdesign" title="parserdesign" width="600" height="450" class="limages" /></a></p>

<p>A week or two ago while visiting California, <a href="http://jonoscript.wordpress.com">Jono</a> and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Ubiquity Parser: The Next Generation</a>. The new architecture is designed to support (1) the use of <a href="http://jonoscript.wordpress.com/2009/01/24/overlord-verbs-a-proposal/">overlord verbs</a>, (2) <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">writing verbs by semantic roles</a>, and (3) better suggestions for <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a> and other argument-first contexts. I&#8217;m happy to say that I&#8217;ve spent some time putting a proof-of-concept together.</p>

<p>I&#8217;ve implemented the basic algorithm of this parser for <a href="http://en.wikipedia.org/wiki/left-branching">left-branching</a> languages (like English) and also implemented some fake English verbs, noun types, and semantic roles. This demo should give you a basic sense of how this parser will attempt to identify different types of arguments and check their noun types even without clearly knowing the verb. This should make the suggestion ranking much smarter, particularly for verb-final contexts. (For a good example, try <code>from Tokyo to San Francisco</code>.)</p>

<h3><a href="http://mitcho.com/code/ubiquity/parser-demo/">➔ Check out the Ubiquity next-gen parser demo</a></h3>

<p><span id="more-1590"></span></p>

<p>Clicking on the <em>environment info</em> will give you some information on the specific verbs, noun types, and roles implemented. You can also scroll through the <em>current parse</em> section to see the step by step derivation of how the suggested parses were constructed.</p>

<p>I&#8217;ll be flying about 15 hours in the next hour as I make my way back to Japan&#8230; hopefully I&#8217;ll make some more progress on the plane! I&#8217;ll look forward to your comments! <em>For those of you interested in checking out the code yourself, you can find it on <a href="http://bitbucket.org/mitcho/ubiquity-playground/">BitBucket</a>.</em></p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>External orders in WordPress queries</title>
		<link>http://mitcho.com/blog/how-to/external-orders-in-wordpress-queries/</link>
		<comments>http://mitcho.com/blog/how-to/external-orders-in-wordpress-queries/#comments</comments>
		<pubDate>Sat, 29 Nov 2008 15:34:40 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[hook]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[query_posts]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[WordPress Planet]]></category>
		<category><![CDATA[WP_Query]]></category>
		<category><![CDATA[YARPP]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1102</guid>
		<description><![CDATA[The advanced WordPress user is intimately familiar with query_posts, the function which controls which posts are displayed in &#8220;The Loop.&#8221; query_posts gives plugin and theme writers the ability to display only posts written in Janary (query_posts("monthnum=1")) or disallow posts from a certain category (query_posts("cat=-529")1). One of the parameters you can set here is orderby which [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/' rel='bookmark' title='Yet Another Related Posts Plugin 2.0'>Yet Another Related Posts Plugin 2.0</a></li>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>The advanced WordPress user is intimately familiar with <a href="http://codex.wordpress.org/Template_Tags/query_posts"><code>query_posts</code></a>, the function which controls which posts are displayed in &#8220;The Loop.&#8221; <code>query_posts</code> gives plugin and theme writers the ability to display only posts written in Janary (<code>query_posts("monthnum=1")</code>) or disallow posts from a certain category (<code>query_posts("cat=-529")</code><sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>). One of the parameters you can set here is <code>orderby</code> which affects the ordering of the posts returned, with allowed values such as <code>author</code>, <code>date</code>, or <code>title</code>. But what if you want to order your posts in some other order, defined outside of your <code>wp_posts</code> table? Here I&#8217;m going to lay out some thoughts on rolling your own external ordering source for WordPress queries.</p>

<p>In order to introduce an external ordering source, we need to do four things:
1. create the external ordering source,
2. hook up (read &#8220;<code>join</code>&#8221;) the external ordering source
3. make sure we use that order, and
4. make it play nice. ^^</p>

<p>By the way, I&#8217;m going to assume you, dear reader, are PHP-savvy, proficient in MySQL, and already know a little about WordPress. This how-to is not for the PHPhobic.</p>

<p><span id="more-1102"></span></p>

<h3>The ordering source</h3>

<p>For this example, suppose we want to display posts by order of &#8220;interestingness.&#8221; We&#8217;ll just create a table called <code>wp_interestingness</code> with two columns, <code>ID</code> and <code>interestingness</code> and populate it with some data. We&#8217;ll even be nice to our database by making sure the <code>ID</code> is the primary key. Easy.</p>

<h3>Hook up the external ordering source</h3>

<p>When you run a query through <code>query_posts()</code> (or use <code>WP_Query</code>&#8217;s <code>query</code> method<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>), what it&#8217;s doing is taking your special request and translating it into a MySQL statement. This means a query like <code>"monthnum=1"</code> is turned into <code>SELECT ... wp_posts.* FROM wp_posts WHERE 1=1 AND MONTH(wp_posts.post_date)='1' ...</code>. Every different query introduces something new to the basic <code>SELECT</code> command—in this case, the <code>AND MONTH(wp_posts.post_date)='1'</code>.</p>

<p>We first want to introduce the <code>interestingness</code> for each post and that means <code>join</code>ing the new table into the query. We&#8217;ll do this using the <code>posts_join</code> <a href="http://codex.wordpress.org/Plugin_API/Filter_Reference">filter</a>. This filter lets you add a <code>join</code> statement to the MySQL request.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">add_filter<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'posts_join'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'my_join_filter'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> my_join_filter<span style="color: #009900;">&#40;</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000088;">$arg</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot; natural join wp_interestingness &quot;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$arg</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>


<p>Note that here we&#8217;re using <code>natural join</code> as <code>wp_posts</code> and <code>wp_interestingness</code> have only one key in common, <code>ID</code>, and that&#8217;s exactly the column we want to join them on.</p>

<h3>Use the new order</h3>

<p>Now that we&#8217;ve <code>join</code>ed <code>wp_interestingness</code> in, we can refer to <code>wp_interestingness.interestingness</code> in our query. Note now that, by default, an <code>$wpdb-&gt;posts.post_date</code> will be used to order the posts. We&#8217;ll use another filter here; this time <code>posts_orderby</code>, to patch this part of the query. We&#8217;ll search for the default <code>ORDER BY</code> value and replace it with our own <code>interestingness</code>.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">add_filter<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'posts_orderby'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'my_orderby_filter'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> my_orderby_filter<span style="color: #009900;">&#40;</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">global</span> <span style="color: #000088;">$wpdb</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$arg</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;<span style="color: #006699; font-weight: bold;">$wpdb-&gt;posts</span>.post_date&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;wp_interestingness.interestingness&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$arg</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>


<p>By the way, you can now check the resulting MySQL query by <code>echo</code>ing <code>$wp_query-&gt;request</code>. (If you&#8217;re using the <code>WP_Query</code> method I advocated below in footnote (2), you&#8217;ll of course have to change <code>$wp_query</code> to the <code>WP_Query</code> object you&#8217;re using.)</p>

<h3>Learn to play nice ^^</h3>

<p>The instructions above do indeed work, but they also cause some major breakdowns in other functions of your blog. Why? That&#8217;s because the current code will edit your queries for every instance of The Loop: your index page, your archives, and your RSS feeds. You probably only want to search by interestingness in certain situations. What we need is a way to tell our (admittedly stupid) <code>my_join_filter</code> and <code>my_orderby_filter</code> when they should apply their <code>interestingness</code> magic and when they shouldn&#8217;t. There are several ways to set up such a system but here I&#8217;ll lay out one that I feel is particularly elegant. We&#8217;ll set it up so you can actually use <code>query_posts("orderby=interestingness")</code> and it&#8217;ll know what you&#8217;re talking about.</p>

<p>One of the first things that happens in <code>query_posts</code>—indeed, way before even the <code>posts_join</code> and <code>posts_orderby</code> filters—is an action hook called <code>parse_query</code>. This lets us look at the initial state of the <code>WP_Query</code> object as it starts to run. In particular, we can look at the <code>orderby</code> query variable and see if we want to order by <code>interestingness</code>. If we do, we&#8217;ll set a global variable called <code>$use_interestingness_flag</code> to be <code>true</code>.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">add_action<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'parse_query'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'set_use_interestingness_flag'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> set_use_interestingness_flag<span style="color: #009900;">&#40;</span><span style="color: #000088;">$query</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">global</span> <span style="color: #000088;">$use_interestingness_flag</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$query</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">query_vars</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'orderby'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'interestingness'</span><span style="color: #009900;">&#41;</span>
		<span style="color: #000088;">$yarpp_score_override</span> <span style="color: #339933;">=</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">else</span>
		<span style="color: #000088;">$yarpp_score_override</span> <span style="color: #339933;">=</span> <span style="color: #009900; font-weight: bold;">false</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>


<p>Now we just have to edit our filters so they only run when <code>$use_interestingness_flag == true</code>. We also will make sure to turn the flag back off in <code>my_orderby_filter</code>, as it&#8217;s our last filter to run during each query. It&#8217;s just like putting the seat back down after using a unisex bathroom.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">add_filter<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'posts_join'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'my_join_filter'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> my_join_filter<span style="color: #009900;">&#40;</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">global</span> <span style="color: #000088;">$use_interestingness_flag</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$use_interestingness_flag</span><span style="color: #009900;">&#41;</span>
		<span style="color: #000088;">$arg</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot; natural join wp_interestingness &quot;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$arg</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
add_filter<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'posts_orderby'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'my_orderby_filter'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> my_orderby_filter<span style="color: #009900;">&#40;</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">global</span> <span style="color: #000088;">$wpdb</span><span style="color: #339933;">,</span> <span style="color: #000088;">$use_interestingness_flag</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$use_interestingness_flag</span><span style="color: #009900;">&#41;</span>
		<span style="color: #000088;">$arg</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;<span style="color: #006699; font-weight: bold;">$wpdb-&gt;posts</span>.post_date&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;wp_interestingness.interestingness&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$arg</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$use_interestingness_flag</span> <span style="color: #339933;">=</span> <span style="color: #009900; font-weight: bold;">false</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$arg</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>


<p>This method has a great advantage as you can just set it up once and invoke it whenever you want, even together with other parameters, without any additional code. For example, you can try <code>query_posts("monthnum=1&amp;orderby=interestingness")</code> or <code>query_posts("cat=-529&amp;orderby=interestingness")</code>.</p>

<h3>Conclusion</h3>

<p>Adding an external ordering source to your WordPress post queries can be relatively straightforward if you understand what <code>query_posts</code> does and take advantage of its <a href="http://codex.wordpress.org/Plugin_API">hooks</a>. This tutorial can also serve as the basis for many other patches to <code>WP_Query</code>, not just the <code>orderby</code> parameter. To better understand the way WordPress builds its MySQL queries and the many <code>posts_*</code> filters which you can take advantage of, go to the source: <code>wp-includes/query.php</code>. Finally, you can use the special <code>parse_query</code> hook and global variables as flags to only apply the filters when necessary.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>This, incidentally, is precisely what I do to hide, by default, <a href="http://twitter.com/mitchoyoshitaka/">my tweets</a> in my <code>index.php</code> and <code>archives.php</code>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>If you&#8217;re going to get serious about rolling your WordPress queries I highly recommend you follow <a href="http://weblogtoolscollection.com/archives/2008/04/13/define-your-own-wordpress-loop-using-wp_query/">Mark Ghosh&#8217;s advice</a> on initializing another object of the <code>WP_Query</code> class and using the <code>query</code> method, rather than just using the global <code>query_posts</code> function.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>The perceptive reader will note that we are still searching for the string <code>"$wpdb-&gt;posts.post_date"</code> in <code>my_orderby_filter</code>, instead of something like <code>"$wpdb-&gt;posts.interestingness"</code>. That&#8217;s because the <code>orderby</code> value of <code>interestingness</code> is not one of the allowed <code>orderby</code> values (search for <code>$allowed_keys</code> in <code>wp-includes/query.php</code> to see the list). Thus the MySQL <code>ORDER BY</code> value is set to the default of <code>"$wpdb-&gt;posts.post_date"</code> before it gets to the <code>posts_orderby</code> filter. Now you know.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/' rel='bookmark' title='Yet Another Related Posts Plugin 2.0'>Yet Another Related Posts Plugin 2.0</a></li>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/external-orders-in-wordpress-queries/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Yet Another Related Posts Plugin 2.0</title>
		<link>http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/</link>
		<comments>http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/#comments</comments>
		<pubDate>Sun, 13 Jul 2008 15:06:06 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[WordPress Planet]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=521</guid>
		<description><![CDATA[Yet Another Related Posts Plugin 2.0.5 16&#160;kb - zip Well, it&#8217;s been a while since I updated my plugin YARPP&#8212;in my humble opinion the best related posts plugin for WordPress. ^^ Today I release version 2.0, incorporating a number of important requests and bug fixes: New algorithm which considers tags and categories, by frequent request [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
<li><a href='http://mitcho.com/blog/projects/modifiying-wordpress-plugin-activation-behavior/' rel='bookmark' title='Modifiying WordPress plugin activation behavior'>Modifiying WordPress plugin activation behavior</a></li>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<div class="files">
<div class="file zip">
<a href="http://downloads.wordpress.org/plugin/yet-another-related-posts-plugin.2.0.5.zip">Yet Another Related Posts Plugin 2.0.5</a><br />
<span class="specs">16&#160;kb - zip</span>
</div>
</div>

<p>Well, it&#8217;s been a while since I updated my plugin <a href="http://mitcho.com/code/yarpp/">YARPP</a>&#8212;in my humble opinion the best related posts plugin for <a href="http://www.wordpress.org">WordPress</a>. ^^ Today I release <a href="http://downloads.wordpress.org/plugin/yet-another-related-posts-plugin.2.0.zip">version 2.0</a>, incorporating a number of important requests and bug fixes:</p>

<ul>
<li>New algorithm which considers tags and categories, by frequent request</li>
<li>Order by score, date, or title, <a href="http://wordpress.org/support/topic/158459">by request</a></li>
<li>Excluding certain tags or categories, <a href="http://wordpress.org/support/topic/161263">by request</a></li>
<li>Sample output displayed in the options screen</li>
<li>Bugfix: <a href="http://wordpress.org/support/topic/155034?replies=5">an excerpt length bug</a></li>
<li>Bugfix: now compatible with the following plugins:

<ul>
<li>diggZEt</li>
<li>WP-Syntax</li>
<li>Viper&#8217;s Video Quicktags</li>
<li>WP-CodeBox</li>
<li>WP shortcodes</li>
</ul></li>
</ul>

<p>Check out the <a href="http://mitcho.com/code/yarpp/">Yet Another Related Posts Plugin page on this site</a>, <a href="http://wordpress.org/extend/plugins/yet-another-related-posts-plugin/">the page on wordpress.org</a>, or <a href="http://downloads.wordpress.org/plugin/yet-another-related-posts-plugin.2.0.zip">download it directly now</a>!</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/yet-another-related-posts-plugin/' rel='bookmark' title='Yet Another Related Posts Plugin'>Yet Another Related Posts Plugin</a></li>
<li><a href='http://mitcho.com/blog/projects/modifiying-wordpress-plugin-activation-behavior/' rel='bookmark' title='Modifiying WordPress plugin activation behavior'>Modifiying WordPress plugin activation behavior</a></li>
<li><a href='http://mitcho.com/blog/projects/markdown-for-wordpress-and-bbpress/' rel='bookmark' title='Markdown for WordPress and bbPress'>Markdown for WordPress and bbPress</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/yet-another-related-posts-plugin-20/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Yet Another Related Posts Plugin</title>
		<link>http://mitcho.com/blog/projects/yet-another-related-posts-plugin/</link>
		<comments>http://mitcho.com/blog/projects/yet-another-related-posts-plugin/#comments</comments>
		<pubDate>Sat, 29 Dec 2007 13:49:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[metablog]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[threshold]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[WordPress Planet]]></category>
		<category><![CDATA[YARPP]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/2007/12/29/yet-another-related-posts-plugin/</guid>
		<description><![CDATA[UPDATE: This posting is now outdated&#8230; for the latest information on YARPP, please visit YARPP&#8217;s very own page on my site, or its page on wordpress.org. If you have questions, please submit on the wordpress.org forum. Thanks! Description Today I&#8217;m releasing Yet Another Related Posts Plugin (YARPP1) 1.0 for WordPress. It&#8217;s the result of some [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/introducing-smartdate/' rel='bookmark' title='Introducing Smartdate'>Introducing Smartdate</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>UPDATE:</h3>

<p>This posting is now outdated&#8230; for the latest information on YARPP, please visit <a href="/code/yarpp/">YARPP&#8217;s very own page</a> on my site, or <a href="http://wordpress.org/extend/plugins/yet-another-related-posts-plugin">its page on <code>wordpress.org</code></a>. If you have questions, please submit on <a href="http://wordpress.org/tags/yet-another-related-posts-plugin">the <code>wordpress.org</code> forum</a>. Thanks!</p>

<h3>Description</h3>

<p>Today I&#8217;m releasing Yet Another Related Posts Plugin (YARPP<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>) 1.0 for <a href="http://www.wordpress.org">WordPress</a>. It&#8217;s the result of some tinkering with <a href="http://peter.mapledesign.co.uk/weblog/archives/wordpress-related-posts-plugin">Peter Bowyer&#8217;s version</a> of <a href="http://wasabi.pbwiki.com/Related%20Entries">Alexander Malov &amp; Mike Lu&#8217;s Related Entries plugin</a>. Modifications made include:</p>

<ol>
<li><em>Limiting by a threshold</em>: Peter Bowyer did the great work of making the algorithm use <a href="http://en.wikipedia.org/wiki/mysql">mysql</a>&#8217;s <a href="http://dev.mysql.com/doc/en/Fulltext_Search.html">fulltext search</a> score to identify related posts. But it currently just displayed, for example, the top 5 most &#8220;relevant&#8221; entries, even if some of them weren&#8217;t at all similar. Now you can set a threshold limit<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> for relevance, and you get more related posts if there are more related posts and less if there are less. Ha!</li>
<li><em>Being a better plugin citizen</em>: now it doesn&#8217;t require the user to click some sketchy button to <code>alter</code> the database and enable a <code>fulltext key</code>. Using <a href="http://codex.wordpress.org/Function_Reference/register_activation_hook"><code>register_activation_hook</code></a>, it does it automagically on plugin activation. Just install and go!</li>
<li><em>Miscellany</em>: a nicer options screen, displaying the fulltext match score on output for admins, an option to allow related posts from the future, a couple bug fixes, etc.</li>
</ol>

<h3>Installation</h3>

<p>Just put it in your <code>/wp-content/plugins/</code> directory, activate, and then drop the <code>related_posts</code> function in your <a href="http://codex.wordpress.org/The_Loop">WP loop</a>. Change any options in the Related Posts (YARPP) Options pane in Admin > Plugins.</p>

<p>You can override any options in an individual instance of <code>related_posts</code> using the following syntax:</p>

<blockquote>
  <p>`related_posts(limit, threshold, before title, after title, show excerpt, len, before excerpt, after excerpt, show pass posts, past only, show score);</p>
</blockquote>

<p>Most of these should be self-explanatory. They&#8217;re also in the same order as the options on the YARPP Options pane.</p>

<p>Example: <code>related_posts(10, null, 'title: ')</code> changes the maximum related posts number to 10, keeps the default threshold from the Options pane, and adds <code>title:</code> to the beginning of every title.</p>

<p>There&#8217;s also a <code>related_posts_exist)</code> function. It has three optional arguments to override the defaults: a threshold, the past only boolean, and the show password-protected posts boolean.</p>

<h3>Examples</h3>

<p>For a barebones setup, just drop <code>&lt;?php related_posts(); ?&gt;</code> right after <code>&lt;?php the_content() ?&gt;</code>.</p>

<p>On my own blog I use the following code with <code>&lt;li&gt;</code> and <code>&lt;/li&gt;</code> as the before/after entry options:</p>


<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>related_posts_exist<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">:</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;p&gt;Related posts:
&lt;ol&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> related_posts<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;/ol&gt;
&lt;/p&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">else</span><span style="color: #339933;">:</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;p&gt;No related posts.&lt;/p&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endif</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>


<h3>Coming soon (probably)</h3>

<ol>
<li>Incorporation of tags and categories in the algorithm. I&#8217;ve gotten the code working, but I still need to think about what the most natural algorithm would be for weighing these factors against the mysql fulltext score currently used (and works pretty well, I must say).</li>
<li>Um, something else! Let me know if you have any suggestions for improvement. ^^</li>
</ol>

<h3>Version log</h3>

<p>1.0   Initial upload (20071229)</p>

<p>1.0.1 Bugfix: 1.0 assumed you had Markdown installed (20070105)</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Pronounced &#8220;yarp!&#8221;, kind of like this, but maybe with a little more joy:<br /><object width="425" height="355"><param name="movie" value="http://www.youtube.com/v/7cOuGJMRORw&#038;rel=1"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/7cOuGJMRORw&#038;rel=1" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"></embed></object>&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>Did you know that threshold has only two h&#8217;s!? I&#8217;m incensed and just went through and replaced all the instances of <code>threshhold</code> in my code. It&#8217;s really not a thresh-hold!?&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/introducing-smartdate/' rel='bookmark' title='Introducing Smartdate'>Introducing Smartdate</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/yet-another-related-posts-plugin/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
	</channel>
</rss>

