<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; French</title>
	<atom:link href="http://mitcho.com/blog/tag/french/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Solving Another Romantic Problem: Weak Pronouns</title>
		<link>http://mitcho.com/blog/projects/solving-another-romantic-problem/</link>
		<comments>http://mitcho.com/blog/projects/solving-another-romantic-problem/#comments</comments>
		<pubDate>Tue, 12 May 2009 08:09:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[Modern Greek]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Portuguese]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[Spanish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2035</guid>
		<description><![CDATA[Yesterday I blogged on how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem. The problem: Weak pronouns in romance languages (as well as some other languages) have a special property where they cliticize to the verb, moving from [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Yesterday I blogged on <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/">how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2</a>, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem.</em></p>

<h3>The problem:</h3>

<p>Weak pronouns in <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a> (as well as some other languages) have a special property where they <em>cliticize</em> to the verb, moving from its regular argument position to a position next to the verb. For example, in French, we have an imperative like (1) with gloss as (2):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez  le  lettre à  Pierre!
send.IMP the letter to Pierre</pre></td></tr></table></div>


<p>If we replace <em>le lettre</em> or <em>à Pierre</em> with a preposition (<em>le</em>, &#8220;it&#8221;, or <em>lui</em>, &#8220;to him&#8221;, respectively), those weak pronouns move next to the verb—in particular, (5) exemplifies the change in word order. Replacing both arguments with prepositions creates the stacked clitic form of (7).<sup id="fnref:3"><a href="#fn:3" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
5
6
7
8
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez-la à  Pierre!
send   -it to Pierre
Envoyez-lui la  lettre!
send   -him the letter
Envoyez-le-lui!
send   -it-him</pre></td></tr></table></div>


<p>The fact that these weak pronouns are attached to the verb and lack separate delimiters mean that we will need a separate mechanism to parse these arguments: indeed, this functionality has been planned in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2</a> as &#8220;step 3&#8221;. Here I&#8217;ll examine some data and discuss a strategy for the parsing of weak pronouns.</p>

<p><span id="more-2035"></span></p>

<h3>Weak pronouns in Ubiquity</h3>

<p>In Ubiquity the only pronoun we currently deal with is the deictic <code>object</code>-role anaphor, like &#8220;it,&#8221; &#8220;this,&#8221; etc. in English.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> In addition, as these weak pronoun clitics cannot by definition be embedded within a larger noun phrase, its referent would constitute the entire <code>object</code> argument. As such, it is most logical to place clitic handling before argument structure parsing and simply hand the argument parser the argument string without the clitic.</p>

<h3>Marking the clitic</h3>

<p>We can classify languages with cliticized weak pronouns into two cases based on their processing considerations: languages that overtly mark the clitic and those which do not.</p>

<h4>Languages which delimit the clitic</h4>

<p>Some languages such as French (see above) clearly mark the boundary between the verb and the clitic. It will be relatively easy to parse weak pronouns in such languages as we can simply <a href="http://ubiquity.mozilla.com/trac/ticket/665">insert a no-width space</a> between the verb and the clitic. A list of clitics can then be designated in the parser (much like anaphora are now) and these weak pronouns can be interpreted as the selection (or &#8220;this&#8221;-referent).<sup id="fnref:2"><a href="#fn:2" rel="footnote">3</a></sup></p>

<p><strong>Portuguese:</strong> (from <a href="http://email.eva.mpg.de/~cysouw/pdf/cysouwDGFS.pdf">Cysouw 2003</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="pr" style="font-family:monospace;">Come-o
eat -it</pre></div></div>


<p><strong>Catalan:</strong> (from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Cerca-ho
search-it</pre></div></div>


<p><strong>Modern Greek:</strong> (from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>; I know, I know, Greek&#8217;s not a romance language, but it has weak pronoun clitics too&#8230; it&#8217;s all good.)</p>

<p>Modern Greek actually inserts a space between the verb and weak pronouns.</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Diavase to
read   -it</pre></div></div>


<h4>Languages which do not delimit the clitic</h4>

<p>Some languages do not insert any delimiter between the verb and the weak pronoun, essentially entering them as a single word (in the string sense, at least). These cases may be more difficult to parse, especially as there may be sound changes to the verb stem itself.</p>

<p><strong>Italian:</strong> (first example from <a href="http://books.google.com/books?id=tnXJVbGpMfEC">Kayne 1994</a>)</p>

<p>Italian is a case where some verbs actually conjoin with the verb in imperatives, much like their prepositions which I noted yesterday have an elaborate system of portmanteau&#8217;ed forms.</p>


<div class="wp_syntax"><div class="code"><pre class="it" style="font-family:monospace;">Fallo
do-it
Mangialo
eat  -it</pre></div></div>


<p><strong>Spanish:</strong> (first example from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>, second from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>

<p>Spanish is the same way:</p>


<div class="wp_syntax"><div class="code"><pre class="es" style="font-family:monospace;">Léelo
read-it
Búscalo
search-it</pre></div></div>


<h3>Displaying the suggestion</h3>

<p>The current Ubiquity handling of anaphora (aka &#8220;magic words&#8221;) involves a display of the selection (replacement) text in a stylized way. One problem with clitics may be how to visually present this replacement to the user.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-11.png" alt="Picture 1.png" border="0" width="284" height="160" /></center></p>

<p>For languages with a delimiter such as French we could simply present the selection as an object right after the verb, without the hyphen.</p>

<table>
<tr><th>input:</th><td>traduisez-le (translate-it)</td></tr>
<tr><th>suggestion:</th><td>traduisez <span style='  padding: 2px;
  -moz-border-radius: 3px;
  display: inline-block;
  font-variant: small-caps;
  background-color: #BBB;
  color: #333;
  position: relative;
  top: -2px;
  font-size: 8pt;
  font-weight: normal;
  border: 1px solid #777;'>selection</span></td></tr>
</table>

<p>Things may be more complicated, however, in languages where the clitic is not delimited from the verb, or where the verb form itself has changed due to the attachment of the clitic.</p>

<h3>Conclusion</h3>

<p>In this blog post I&#8217;ve tried to lay out some of the weak pronoun phenomena relevant to Ubiquity with some ideas on how to implement its parsing. I believe parsing weak pronouns should be relatively straightforward in languages with delimiters—for those which do not have delimiters, some creativity may be required in how building regular expressions or rules to detect the clitics and in presenting these suggestions to the user.</p>

<p><strong>Does your language have weak pronoun clitics? What do you think will be the challenges in trying to parse these arguments?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:3">
<p>Note that the reverse order of &#8220;Envoyez-lui-le&#8221; is ungrammatical&#8230; fortunately we most likely will not have to deal with multiple clitics&#8230; see footnote two below.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This is not so much an informed decision that we should not do different kinds of anaphors but simply that we haven&#8217;t gotten around to implementing it. I personally am not sure, however, whether there is a real need for parsing for anaphors for roles other than <code>object</code> (for example, French <em>lui</em> as seen above which would be a <code>goal</code> anaphor).&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>There is, however, a question of whether weak pronoun replacement should be obligatory or not: that is, if we see a regular anaphor right now such as &#8220;this,&#8221; we make two copies of the parse: one with the replacement, one without. In the case where we detect an anaphor, should the replacement be obligatory? I believe it should be, though, as with many other Parser 2 features, I believe we can continue to parse other options with no replacement but let the scoring system kill those parses off. If a verb has a clitic attached to it but we do not remove it, it most likely will do very poorly in scoring anyway.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-another-romantic-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</title>
		<link>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/</link>
		<comments>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/#comments</comments>
		<pubDate>Mon, 11 May 2009 05:19:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[portmanteau]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2019</guid>
		<description><![CDATA[The problem: In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>The problem:</h3>

<p>In many <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a>, prepositions and articles often form <a href="http://en.wikipedia.org/wiki/portmanteau">portmanteau morphs</a>, combining to form a single word.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau&#8217;ed prepositions and articles&#8230; I refer you to the <a href="http://en.wikipedia.org/wiki/Contraction (grammar)#Italian">contraction</a> article on Wikipedia.</p>

<p>As I <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">noted a couple weeks ago</a>, however, some combinations do not form portmanteaus.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>

<p><span id="more-2019"></span>
<strong>French:</strong></p>

<ol>
<li>à + le > au</li>
<li>à + la > à la</li>
</ol>

<p>The problem with this is that if we use both <em>à</em> and <em>au</em> as delimiters, we may end up passing the definite article to the verb as part of the argument in some cases, but not in other cases.</p>

<ol>
<li>&#8220;<strong>à</strong> la table&#8221; = &#8220;<strong>to</strong> the table&#8221;</li>
<li>&#8220;<strong>au</strong> chat&#8221; = &#8220;<strong>to the</strong> cat&#8221;</li>
</ol>

<h3>The solution:</h3>

<p>The solution is a new step in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">the Parser 2 process</a> which normalizes the form of arguments. Each language&#8217;s parser can now optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: '', newInput: '', suffix: ''}</code>. For example, if you feed &#8220;la table&#8221; to the French <code>normalizeArgument()</code>, it ought to return</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#123;</span>prefix<span style="color: #339933;">:</span> <span style="color: #3366CC;">'la '</span><span style="color: #339933;">,</span> newInput<span style="color: #339933;">:</span> <span style="color: #3366CC;">'table'</span><span style="color: #339933;">,</span> suffix<span style="color: #339933;">:</span> <span style="color: #3366CC;">''</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#93;</span></pre></div></div>


<p>If there are no possible normalizations, <code>normalizeArgument()</code> should simply return <code>[]</code>. Each alternative returned by <code>normalizeArgument()</code> is substituted into a copy of the possible parses just before nountype detection. The prefixes and suffixes are stored in the argument (as <code>inactivePrefix</code> and <code>inactiveSuffix</code>) so they can be incorporated into the suggestion display.</p>

<p>Here, for example, is how the inactive prefix &#8220;l&#8217;&#8221; is displayed in <a href="chrome://parser-demo/content/index.html">the parser demo</a>. This way the user is told that the &#8220;l&#8217;&#8221; prefix is being ignored, and the nountype detection and verb action can act on the argument &#8220;English&#8221;.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> (In the future, of course, we could teach this nountype to accept the Catalan &#8220;anglès&#8221;.)</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-1.png" alt="Picture 1.png" border="0" width="320" height="29" /></center></p>

<p>The easiest way to produce this output is to use the <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/String/match"><code>String.match()</code></a> method. For example <code>normalizeArgument()</code> code, I refer you to the <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/ca.js">Catalan</a> and <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/fr.js">French</a> parser files.</p>

<p>I hope that this solution will help make Ubiquity with Parser 2 feel <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for many romance languages.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>Thanks to <a href="http://people.ucsc.edu/~jpobrien/">Jeremy O&#8217;Brien</a> for helping me figure out how to refer to this phenomenon.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This also relates to the issue of <a href="http://ubiquity.mozilla.com/trac/ticket/671">parsing multi-word delimiters</a>, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Thank you to contributor <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a> for our first attempt at a Catalan parser!&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Adding Your Language to Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 11:44:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case marking]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[semantic roles]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1956</guid>
		<description><![CDATA[NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions. You&#8217;ve seen the video. You speak another language. And you&#8217;re wondering, &#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221; The answer: not that hard. With [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">these instructions</a>.</strong></p>

<p>You&#8217;ve <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">seen the video</a>. You speak another language. And you&#8217;re wondering, <strong>&#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221;</strong> The answer: <strong>not that hard.</strong> With a little bit of JavaScript and knowledge of and interest in your own language, you&#8217;ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please <a href="http://ubiquity.mozilla.com/trac/ticket/662">submit your (even incomplete) language files</a>!</p>

<p><em>As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the <a href="http://ubiquity.mozilla.com/planet/">Ubiquity Planet</a> and/or <a href="http://mitcho.com/blog/">this blog</a> (<a href="http://mitcho.com/blog/feed/blog-only/">RSS</a>).</em></p>

<p><span id="more-1956"></span></p>

<h3>Set up your environment</h3>

<p>If you&#8217;re new to Ubiquity core development, you&#8217;ll want to first read the <a href="http://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">Ubiquity 0.1 Development Tutorial</a> to learn how to get a live copy of the Ubiquity repository using <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>. Once you&#8217;ve set up your Firefox profile to use this development version, make sure to try changing the <code>extensions.ubiquity.parserVersion</code> value to 2 in <code>about:config</code> (as seen in <a href="(http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/)">this demo video</a>) to verify that Parser 2 is working for you.</p>

<p>As you read along, you may find it beneficial to follow along in the languages currently included in Parser 2: <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/en.js">English</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/pt.js">Portuguese</a>, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/sv.js">Swedish</a> (and the incomplete <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a> and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/fr.js">French</a>).</p>

<h3>The structure of the language file</h3>

<p>Each language in Parser 2 gets its own file which acts as a <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>. You&#8217;ll need to look up the <a href="http://en.wikipedia.org/wiki/List of ISO 639-1 codes">ISO 639-1 code for your language</a>&#8230; Here we&#8217;ll use English (code <code>en</code>) as an example here and the JavaScript language file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.</p>

<p>Here is the basic template for a Ubiquity Parser 2 language file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> EXPORTED_SYMBOLS <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;makeEnParser&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">typeof</span> window<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #3366CC;">'undefined'</span><span style="color: #009900;">&#41;</span> <span style="color: #006600; font-style: italic;">// kick it chrome style</span>
  Components.<span style="color: #660066;">utils</span>.<span style="color: #003366; font-weight: bold;">import</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;resource://ubiquity/modules/parser/new/parser.js&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> makeEnParser<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> en <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Parser<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'en'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
...
&nbsp;
  <span style="color: #000066; font-weight: bold;">return</span> en<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>


<p>After lines 1-4 which set up the <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>, everything else is wrapped in a factory function called <code>makeLaParser</code> (for Latin) or <code>makeEnParser</code> (for English, <code>en</code>) or <code>makeFrParser</code> (for French, <code>fr</code>), etc. This function initializes the new <code>Parser</code> object (line 7) with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That&#8217;s it!</p>

<p>Now let&#8217;s walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.</p>

<h3>Identifying your branching parameter</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">branching</span> <span style="color: #339933;">=</span> <span style="color: #3366CC;">'right'</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// or 'left'</span></pre></div></div>


<p>One of the first things you&#8217;ll have to set for your parser is <strong>the <code>branching</code> parameter</strong>. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or &#8220;role marker&#8221; (most often, these are <a href="http://en.wikipedia.org/wiki/adposition">prepositions or postpositions</a>. For example, in English &#8220;from&#8221; is a delimiter for the <code>goal</code> role and its argument is on its right.</p>

<table>
<tr><td>&nbsp;</td><td>&nbsp;</td><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-right.png) no-repeat right bottom'>&nbsp;</td></tr>
<tr><td><b>to</b></td><td>Mary</td><td><b>from</b></td><td>John</td></tr>
</table>

<p>So &#8220;John&#8221; is a possible argument for the <code>source</code> role, but &#8220;Mary&#8221; should not be. Ubiquity can figure this out because English has the property <code>en.branching = 'right'</code>.</p>

<p>In Japanese, on the other hand, the argument of a delimiter like から (&#8220;from&#8221;) is found on the left of that delimiter, so <code>en.branching = 'left'</code>.</p>

<table>
<tr><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-left.png) no-repeat left bottom'>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>メアリー</td><td><b>-から</b></td><td>ジョン</td><td><b>-に</b></td></tr>
<tr><td>Mary</td><td><b>from</b></td><td>John</td><td><b>to</b></td></tr>
</table>

<p>In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.</p>

<p><strong>For more info</strong>:</p>

<ul>
<li>see <a href="http://en.wikipedia.org/wiki/Branching (linguistics)">branching</a> on Wikipedia.</li>
</ul>

<h3>Defining your roles</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">roles</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'goal'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'to'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'source'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'from'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'at'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'on'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'alias'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'as'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'using'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'with'</span><span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a> and a corresponding delimiter. Note that this mapping can be <a href="http://en.wikipedia.org/wiki/many-to-many (data model)">many-to-many</a>, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a>.</p>

<p><strong>For more info:</strong></p>

<ul>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the proposed inventory of semantic roles</a></li>
<li>Wikipedia entry on <a href="http://en.wikipedia.org/wiki/thematic relations">thematic relations</a></li>
</ul>

<h3>Entering your anaphora (&#8220;magic words&#8221;)</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">anaphora</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;this&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;that&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;it&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;selection&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;him&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;her&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;them&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The final required property is the <code>anaphora</code> property which takes a list of &#8220;magic words&#8221;. Currently there is no distinction between all the different <a href="http://en.wikipedia.org/wiki/deixis">deictic</a> <a href="http://en.wikipedia.org/wiki/anaphora (linguistics)">anaphora</a> which might refer to different things.</p>

<h3>Special cases</h3>

<p>Some special language features can be handled by overriding the default behavior from <code>Parser</code>. Many of these features are still in the works, however, so we&#8217;d love to get your comments!</p>

<h4>Languages with no spaces</h4>

<p>If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> function and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a> or <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a>.</p>

<h4>Case marking languages</h4>

<p><strike>If you have a strongly <a href="http://en.wikipedia.org/wiki/grammatical case">case-marked</a> language, you&#8217;ll have to write some rules to identify those different cases in <code>wordBreaker()</code> and then add some extra <code>roles</code> for these case markers, but for a number of languages the current design does not allow an elegant solution for parsing such arguments. Updates to this issue will be posted to <a href="http://ubiquity.mozilla.com/trac/ticket/663">this trac ticket</a>.</p>

<p>In the mean time, however, if you could write a parser even with only the prepositions/postpositions in your language, that would be a great benefit in getting started in your language.</strike> <strong>UPDATE</strong>: a proposal on how to deal with strongly case-marked languages has been written here: <a href="http://mitcho.com/blog/projects/in-case-of-case/">In Case of Case&#8230;</a>.</p>

<h4>Stripping articles</h4>

<p>Some languages have some delimiters which combine with articles. For example, in French, the preposition &#8220;à&#8221; combines with the masculine definite article &#8220;le&#8221; but not &#8220;la&#8221;:</p>

<ol>
<li>à + la = à la</li>
<li>à + le = au</li>
</ol>

<p>You can add both &#8220;à&#8221; and &#8220;au&#8221; as delimiters of the <code>goal</code> role, but then you will get feminine arguments back with the determiner (e.g. &#8220;la table&#8221;) while masculine arguments would be parsed without a determiner (e.g. &#8220;chat&#8221;).</p>

<ol>
<li>&#8220;<b>à</b> la table&#8221; = &#8220;<b>to</b> the table&#8221;</li>
<li>&#8220;<b>au</b> chat&#8221; = &#8220;<b>to the</b> cat&#8221;</li>
</ol>

<p><strike>One possible solution to this is to write a custom <code>cleanArgument()</code> method. After arguments have been parsed and placed in their appropriate roles, each argument text (say, &#8220;la table&#8221; or &#8220;chat&#8221;) are passed to <code>cleanArgument()</code>. You can simply write a <code>cleanArgument()</code> to strip off any &#8220;la &#8221; at the beginning of the input and return it and both example inputs will get normalized arguments: &#8220;table&#8221; and &#8220;chat&#8221;, respectively.</strike> <strong>UPDATE</strong>: For more up-to-date information on how to deal with these types of articles, please see <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">Solving a Romance Problem</a>.</p>

<h3>Test your parser</h3>

<p>Now you can go into <code>about:config</code> and change <code>extensions.ubiquity.language</code> to be your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> If you run into any trouble, feel free to ask for help on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity i18n listhost</a> or find me on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you&#8217;re at a good stopping point, please <a href="http://ubiquity.mozilla.com/trac/ticket/662">contribute your language file to Ubiquity</a>!</p>

<h3>More to come&#8230;</h3>

<p>At this point, you&#8217;ve only localized the <a href="http://en.wikipedia.org/wiki/argument structure">argument structure</a> of your language&#8230; additional work will be required to localize the nountypes and verb names, which is <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">the subject of ongoing discussion</a>&#8230; <a href="http://groups.google.com/group/ubiquity-i18n">join the Google Group</a> to get in on the discussion!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>At this point in time it&#8217;s also possible to test your parser at <code>chrome://parser-demo/content/index.html</code> if you make a couple other changes to your code&#8230; for more information, watch the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Foxkeh demos Ubiquity Parser TNG</a> video. This option gives you more debug info as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Foxkeh demos Ubiquity Parser: The Next Generation</title>
		<link>http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/</link>
		<comments>http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 10:10:43 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[contribute]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[Foxkeh]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[screencast]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1725</guid>
		<description><![CDATA[I just made a screencast with Foxkeh to demo the Ubiquity next generation parser demo and to demonstrate how easy it is to add your own language. Foxkeh wants you to localize the parser into your language. How could you say no? ^^ Foxkeh demos Ubiquity Parser: The Next Generation from mitcho on Vimeo. There [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I just made a screencast with <a href="http://foxkeh.jp">Foxkeh</a> to demo the <a href="http://mitcho.com/code/ubiquity/parser-demo/">Ubiquity next generation parser demo</a> and to demonstrate how easy it is to add your own language. Foxkeh wants you to localize the parser into your language. How could you say no? ^^</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3954284&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3954284&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/3954284">Foxkeh demos Ubiquity Parser: The Next Generation</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p>There are some details which are not covered in this introductory video, such as how to deal with case marking languages or languages without spaces. Hopefully this&#8217;ll inspire some people to play with the demo, though. <strong>I&#8217;d love to hear your comments! ^^</strong></p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>This week on Ubiquity Parser: The Next Generation</title>
		<link>http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/</link>
		<comments>http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/#comments</comments>
		<pubDate>Fri, 27 Mar 2009 06:30:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Danish]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[humor]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[update]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1680</guid>
		<description><![CDATA[Last week I released a proof-of-concept demo of the next generation Ubiquity parser design and it was also the focus of discussion in our weekly internationalization meeting.1 Christian Sonne even wrote a Danish plugin for it during the meeting—a testament to the pluggability and of the new parser design. In addition, at the Ubiquity weekly [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-portuguese/' rel='bookmark' title='Ubiquity in Portuguese'>Ubiquity in Portuguese</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/parsertng.png" alt="parsertng.png" border="0" width="649" height="277" /></p>

<p><a href="http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/">Last week</a> I released <a href="http://mitcho.com/code/ubiquity/parser-demo/">a proof-of-concept demo</a> of the <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">next generation Ubiquity parser design</a> and it was also the focus of discussion in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-03-24_i18n_Meeting">our weekly internationalization meeting</a>.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> Christian Sonne even wrote a Danish plugin for it during the meeting—a testament to the pluggability and of the new parser design.</p>

<p>In addition, at <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-03-25_Weekly_Meeting">the Ubiquity weekly meeting</a>, pushing this new parser into Ubiquity proper was identified as <a href="https://wiki.mozilla.org/Labs/Ubiquity/0.2_Roadmap_Proposals">a key goal of Ubiquity 0.2</a>, making frequent iteration and debate over this parser essential.</p>

<p>To that end, I&#8217;ll highlight some of the changes made to the parser demo <a href="http://bitbucket.org/mitcho/ubiquity-playground/">codebase</a> in the past week:
<span id="more-1680"></span></p>

<ul>
<li><a href="http://en.wikipedia.org/wiki/left-branching">left-branching</a> support and a Japanese parser</li>
<li>basic French parser</li>
<li>a timer display</li>
<li>Danish parser by Christian Sonne</li>
<li>synonyms: as an example, you can now use &#8220;purchase&#8221; or &#8220;buy,&#8221; both of which point to the same verb.</li>
<li>verb name localizations: you no longer need to use the English verb names with different languages. (Currently only Japanese has any verb localizations.)</li>
<li>a number of optimizations and corrections</li>
</ul>

<p>I encourage you to check out the demo again or <a href="http://bitbucket.org/mitcho/ubiquity-playground/">check out the source on BitBucket</a>.</p>

<h3><a href="http://mitcho.com/code/ubiquity/parser-demo/">➔ Ubiquity Parser: The Next Generation demo</a></h3>

<p>I&#8217;d love to get comments, patches, or additional parsers! Thanks! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>The weekly internationalization meeting, like all Ubiquity weekly meetings, are completely open to the public. We&#8217;d love to hear new voices contribute to the discussion! Take a look at <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings">the schedule of upcoming meetings</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/link/ubiquity-in-portuguese/' rel='bookmark' title='Ubiquity in Portuguese'>Ubiquity in Portuguese</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

