<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; constraints</title>
	<atom:link href="http://mitcho.com/blog/tag/constraints/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Scoring and Ranking Suggestions</title>
		<link>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/</link>
		<comments>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 07:17:26 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[constraints]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[Optimality Theory]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1745</guid>
		<description><![CDATA[I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I&#8217;d put some of these thoughts down in writing. The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> so I thought I&#8217;d put some of these thoughts down in writing.</p>

<p>The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four &#8220;scores&#8221; for each suggestion:</p>

<p><span id="more-1745"></span></p>

<ol>
<li><code>duplicateDefaultMatchScore</code>: 100 by default—lowered if an unused argument gets multiple suggestions (in <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/0aaeae361c33/ubiquity/modules/parser/parser.js#l558">the words of the code</a>: &#8220;reduce the match score so that multiple entries with the same verb are only shown if there are no other verbs.&#8221;)</li>
<li><code>frequencyMatchScore</code>: a score from the <code>suggestion memory</code> of the frequency of the suggestion&#8217;s verb, given the input verb (currently the first word) or nothing, in the case of noun-first suggestions</li>
<li><code>verbMatchScore</code>: float in [0,1]: (as described <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_Documentation#Scoring_the_Quality_of_the_Verb_Match">here</a>)

<ul>
<li>0.75 is returned in case there it is a noun-first suggestion (by virtue of the fact that <code>String.indexOf('')==0</code>)</li>
<li>1 if the verb name is equivalent across input-output</li>
<li>in [0.75,1) if the input is a prefix of the suggestion verb name</li>
<li>in [0.5,0.75) if the input is a non-prefix substring of the suggestion verb</li>
<li>in [0.25,0.5] if the input is a prefix of one of the <code>synonyms</code></li>
<li>in [0,0.25) if the input is a non-prefix substring of one of the <code>synonyms</code></li>
</ul></li>
<li><code>argMatchScore</code>: the number of arguments with matching &#8220;specific&#8221; nountypes, where &#8220;specific&#8221; is designated by the nountype having property <code>rankLast=false</code>.</li>
</ol>

<p>With the numeric scores for each of these criteria, a partial order of suggestions is constructed using a <a href="http://en.wikipedia.org/wiki/lexicographic order">lexicographic order</a>: that is, compare candidates first using <code>duplicateDefaultMatchScore</code>, break ties using <code>frequencyMatchScore</code>, if still tied break using <code>verbMatchScore</code>, and if still tied break using <code>argMatchScore</code>. This paradigm of constraints is called &#8220;strictly ranked&#8221; and a corollary of this is that lower constraints, no matter how well you score on them, can never overcome a loss at a higher constraint. A crucial corollary of this system is that lower constraints&#8217; scores need not be computed if a higher constraint already dooms it to a lower position.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<h3>Ranking in The Next Generation</h3>

<p>One of the goals of <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> is to make noun/argument-first input first-class citizens of Ubiquity, improving their suggestions in particular to the benefit of <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a>. Arguments will be split up and tested against different noun types before a verb is even entered into the input, in which case target verbs can be ranked according to the appropriateness of the input&#8217;s arguments. As such, I believe the <code>argMatchScore</code> criteria above should either be ranked higher in a strictly ranked model or be allowed to overtake lower scores for the higher constraints in a non-strictly ranked model.</p>

<p>The <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> proposal and <a href="http://mitcho.com/code/ubiquity/parser-demo">demo</a> currently orders using a product of various criteria&#8217;s scores, rather than a lexicographic order of strictly ranked constraints. The component factors are:</p>

<ol>
<li><code>0.5</code> for parses where the verb was suggested</li>
<li><code>0.5</code> for each extra (>1) <code>object</code> argument (essentially &#8220;unused words&#8221; in the previous parser)</li>
<li>the score of each argument against that semantic role&#8217;s target noun type</li>
<li><code>0.8</code> for each unset argument of that verb</li>
</ol>

<p>Each component score is a value in [0,1], so the score is always non-decreasing across the derivation. This offers a natural way to optimize the candidate set creation: if a possible parse ever gets a score below a magic &#8220;threshold&#8221; value, it is immediately thrown away.</p>

<p>A possible problem with the current Parser TNG scoring model is that it will implicitly hinder verbs and parses with more arguments as it could have more sub-1 noun type score factors—this consideration may be great enough that a weighted additive model should be considered over a multiplicative one.</p>

<p><strong>How do you think we can make Ubiquity&#8217;s suggestion ranking smarter? What other factors should be considered, and what factors could be left out?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>For all the linguists in the audience, if this sounds like <a href="http://en.wikipedia.org/wiki/Optimality Theory">Optimality Theory</a>, you would be right—there&#8217;s a little bit of <a href="http://roa.rutgers.edu/view.php3?roa=537">Prince and Smolensky (1993)</a> hanging out <a href="http://ubiquity.mozilla.com">in your browser</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

