<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; Dutch</title>
	<atom:link href="http://mitcho.com/blog/tag/dutch/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Big Issues and Small Issues with Parser 2</title>
		<link>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/</link>
		<comments>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/#comments</comments>
		<pubDate>Wed, 20 May 2009 03:05:53 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Dutch]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Greek]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2087</guid>
		<description><![CDATA[Jono and I had a good conversation this morning on IRC about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for Ubiquity. Here are our Top 4 Big Issues: Some commands&#8217; preview&#8217;s and execute&#8217;s are not working properly (trac #652). This could be an underlying issue with [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><a href="http://jonoscript.wordpress.com/">Jono</a> and I had a good conversation this morning on <a href="irc://irc.mozilla.org/ubiquity">IRC</a> about the remaining Big Issues which are blocking the release of Parser 2 as the default parser for <a href="http://ubiquity.mozilla.com">Ubiquity</a>. Here are our <strong>Top 4 Big Issues</strong>:</p>

<ol>
<li><strong>Some commands&#8217; <code>preview</code>&#8217;s and <code>execute</code>&#8217;s are not working properly</strong> (<a href="http://ubiquity.mozilla.com/trac/ticket/653">trac #652</a>). This could be an underlying issue with some pipes not rerouted correctly in Parser 2, or it could be that the commands have not been rewritten correctly to take advantage of Parser 2.</li>
<li><strong>Flesh out how to localize resources, like commands and nountypes.</strong> We <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">started a conversation</a> on this subject a few weeks ago but we never reached a resolution. This blocks issues 3 and 4 below.</li>
<li><strong>We need to standardize a format for commands for Parser 2.</strong> As <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-05-13_Weekly_Meeting#Notes">noted in last week&#8217;s meeting</a> (among other places) Parser 2 will require at least some modification to all commands. Jono and I came up with a simple hybrid format for commands which specify <code>takes</code> and <code>modifiers</code> for Parser 1 and <code>arguments</code> for Parser 2, but until we figure out how exactly the localization of commands will work, we can&#8217;t write a definitive standard.</li>
<li><strong>Enable nountype localization.</strong> While the most popular nountypes used are those that ship with Ubiquity, it is important to come up with a localization process which can apply to custom nountypes as well. Nountype localizations need the ability to either (1) replace the <code>_name</code> only, or (2) replace both the <code>_name</code> and the <code>suggest()</code> logic, as both cases will be necessary.</li>
</ol>

<p>Given that Big Issue 3 and Big Issue 4 are both dependent on Big Issue 2, there clearly needs to be a continued public discussion of how we should make these resources localizable. <strong>I look forward to this discussion taking place <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-05-20_Weekly_Meeting">at tomorrow&#8217;s joint (general + i18n) Ubiquity meeting</a>.</strong></p>

<p>In other news, here are some <strong><small>Small Issues</small></strong>:</p>

<ol>
<li><strong>Add a switch for parser version and language settings</strong>: Jono&#8217;s already made a space for this in the new &#8220;Settings and Skins&#8221; page in <code>about:ubiquity</code>. He&#8217;s on it. Like a bonnet.</li>
<li><strong>Magic word (anaphor) substitution is not yet working properly.</strong> This needs to work both when there is an explicit magic word and when there are simply missing arguments.</li>
<li><strong>The position of suggested verbs is always sentence-initial</strong> (<a href="http://ubiquity.mozilla.com/trac/ticket/655">trac #655</a>). This also requires that we can specify whether verb name localizations are sentence-initial forms or sentence-final forms.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></li>
<li><a href="http://ubiquity.mozilla.com/trac/search?ticket=on&amp;q=new-parser">&#8230;</a></li>
</ol>

<p>Let&#8217;s hit the code!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>German, Dutch, and Greek, for example, are all languages where there are both command verb forms which are sentence-initial and sentence-final.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/' rel='bookmark' title='A Demonstration of Ubiquity Parser 2'>A Demonstration of Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automating the Linguist&#8217;s Job</title>
		<link>http://mitcho.com/blog/projects/automating-the-linguists-job/</link>
		<comments>http://mitcho.com/blog/projects/automating-the-linguists-job/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 08:57:58 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[analogy]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[deduction]]></category>
		<category><![CDATA[Dutch]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[patterns]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1634</guid>
		<description><![CDATA[At the end of my blog post yesterday I hinted at an exciting possible approach to Ubiquity&#8217;s localization: In the future we ideally could build a web-based system to collect these &#8220;utterances.&#8221; We could &#8230; generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>At the end of <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/">my blog post yesterday</a> I hinted at an exciting possible approach to Ubiquity&#8217;s localization:</p>

<blockquote>
  <p>In the future we ideally could build a web-based system to collect these &#8220;utterances.&#8221; We could &#8230; generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.</p>
</blockquote>

<p>If we build this type of &#8220;command-bank&#8221; of common Ubiquity input translated into various languages, we could build a tool to learn various features of each language and generate each parser, essentially <em>learning the language based on data</em>. Today I&#8217;ll elaborate on how I believe this could be possible, by analogy to another language learning device: <strong>the human</strong>.</p>

<p><span id="more-1634"></span></p>

<h3>Step 1: learning words</h3>

<p>How does a human learn language? Without getting into any <a href="http://en.wikipedia.org/wiki/language acquisition">details or theory</a>, we can say that the input for a language learner is always a combination of <em>linguistic input and a referent</em>. In the case of a child, this could be a pairing of linguistic input with <em>real world stimulus</em>:</p>

<p><center></p>

<table style='border:none;'><tr><th>input</th><th>referent</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;text-align:center;'>“taiyaki!”</td><td><img src='http://farm4.static.flickr.com/3543/3357452751_977fcce70c.jpg?v=0' width='300'/><br/>
by <a href='http://www.flickr.com/photos/makitani/3357452751/'>makitani</a> via <a href='http://creativecommons.org'>creative commons</a>.</td></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;width:50%;text-align:center;'>“cat!”</td><td><img src='http://farm4.static.flickr.com/3285/2387513295_2768ddf662.jpg?v=0' width='300'/><br />
by <a href='http://www.flickr.com/photos/victoriachan/2387513295/in/set-72157604986983169/'>victoriachan</a> via <a href='http://creativecommons.org'>creative commons</a>.</td></tr>
</table>

<p></center></p>

<p>The human child will hear &#8220;cat&#8221; while looking at the cat and, with time and repetition, learn that that thing is called a &#8220;cat,&#8221; and <a href="http://en.wikipedia.org/wiki/taiyaki">some other thing</a> is called &#8220;taiyaki.&#8221;</p>

<p>Similarly, we could take single-verb data points from our command-bank to match new words with a know referent—in this case, the base English string. Here&#8217;s an example from <a href="http://jan.moesen.nu/">Jan&#8217;s</a> comment on <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/">yesterday&#8217;s sample survey</a>.</p>

<p><center></p>

<table style='border:none;'><tr><th>input (Dutch)</th><th>referent (English)</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;text-align:center;'>zoek</td><td style='font-size:2em;color:blue;font-weight:bold;text-align:center;'>search</td></tr>
</table>

<p></center></p>

<h3>Step 2: deduction</h3>

<p>Now suppose we know some single words like &#8220;taiyaki&#8221; and &#8220;cat.&#8221; Consider the two situations. Given the first sentence and referent &#8220;mitcho&#8217;s eating a taiyaki,&#8221; the child could intuit the appropriate linguistic representation for the latter situation.</p>

<p><center></p>

<table style='border:none;'><tr><th>input</th><th>referent</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;width:50%;text-align:center;'>“mitcho&#8217;s eating a taiyaki!”</td><td><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/eattaiyaki.jpg" alt="eattaiyaki.jpg" border="0" width="300" height="225" /></td></tr>
<tr><td style='font-size:2em;color:red;font-weight:bold;text-align:center;'>???</td><td><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/eatcat.jpg" alt="eatcat.jpg" border="0" width="300" height="225" /></td></tr>
</table>

<p></center></p>

<p>The process is simple. First note that there is only one variable changed between the two situations: the taiyaki has been replaced by a cat head. You can then construct the correct utterance <em>by analogy</em>, replacing &#8220;taiyaki&#8221; with &#8220;cat,&#8221; yielding &#8220;mitcho&#8217;s eating a cat!&#8221;<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup></p>

<p>Similarly, we could build a tool to analyze the data in a translated command-bank to identify particular features of each language, generating at least basic parsers for each language. Such a task would require a number of <em><a href="http://en.wikipedia.org/wiki/minimal pairs">minimal pairs</a></em> in our data set—here&#8217;s one such example from yesterday&#8217;s survey (with Dutch data from <a href="http://jan.moesen.nu/">Jan</a>):</p>

<p><center></p>

<table style='border:none;'><tr><th>input (Dutch)</th><th>referent (English)</th></tr>
<tr><td style='font-size:1.5em;color:orange;font-weight:bold;text-align:center;'>zoek HELLO met Google</td><td>
<span style='font-size:1.5em;color:blue;font-weight:bold;'>search HELLO with Google</span><br/>
<code>
<pre>Parse {
  verb:      'search',
  arguments: {
    object:  ['HELLO'],
    service: 'Google'
  }
}</pre>
</code></td></tr>
<tr><td style='font-size:1.5em;color:orange;font-weight:bold;text-align:center;'>zoek dit met Google</td><td>
<span style='font-size:1.5em;color:blue;font-weight:bold;'>search this with Google</span><br/>
<code>
<pre>Parse {
  verb:      'search',
  arguments: {
    object:  ['this'],
    service: 'Google'
  }
}</pre>
</code></td></tr></table>

<p></center></p>

<p>A simple string analysis<sup id="fnref:3"><a href="#fn:3" rel="footnote">2</a></sup> would tell us that the text <code>HELLO</code> was replaced by <code>dit</code> in the latter Dutch sentence. Meanwhile, since the English reference sentence is chosen manually, we also know the appropriate parses for each of those sentences. An object difference operation would note that the <code>object</code> property was changed from a value of <code>'HELLO'</code> to <code>'this'</code>. We could then map <code>dit</code> to the English <code>this</code>. We&#8217;ve now learned one (of perhaps many) Dutch deictic pronouns (aka &#8220;magic words&#8221;).</p>

<p>Given <a href="http://mitcho.com/code/ubiquity/parser-demo/">an adequately universal but customizable parser design</a>, we can then develop tests for various parameters by constructing appropriate <a href="http://en.wikipedia.org/wiki/minimal pairs">minimal pairs</a> in the base sentences and having them translated.<sup id="fnref:1"><a href="#fn:1" rel="footnote">3</a></sup> As noted yesterday, such a system could reduce the laborious task of writing individual parsers to a task of string translation, which <a href="https://wiki.mozilla.org/L10n:Home_Page">our community does exceedingly well</a>. <strong>I&#8217;m eager to hear what others think of this approach. What concerns would you have for this approach? What potential benefits do you see?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>I mean no offense to human children with this simplified example. Surely you can learn more than just string replacements.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>I started building some string analysis toys in JavaScript today, such as a <a href="http://mitcho.com/code/ubiquity/levenshtein/">Levenshtein difference demo</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>The linguists in the audience may note that this parser&#8217;s modular design is indeed in the spirt of the <a href="http://en.wikipedia.org/wiki/principles and parameters">principles and parameters</a> framework.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/automating-the-linguists-job/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

