<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; linguistics</title>
	<atom:link href="http://mitcho.com/blog/tag/linguistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Better Linguist List RSS Feeds</title>
		<link>http://mitcho.com/blog/projects/better-linguist-list-rss-feeds/</link>
		<comments>http://mitcho.com/blog/projects/better-linguist-list-rss-feeds/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 23:18:14 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[annoyances]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Linguist List]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[RSS]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=3555</guid>
		<description><![CDATA[Everyone I know in linguistics uses the LINGUIST List website to a greater or lesser degree. Linguist List began as a mailing list in the 90&#8217;s, with book, job, and dissertation announcements, call-for-papers, and general academic discussions. Nowadays many people follow the various announcements on Linguist List using an RSS feed reader such as Google [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/automating-the-linguists-job/' rel='bookmark' title='Automating the Linguist&#8217;s Job'>Automating the Linguist&#8217;s Job</a></li>
<li><a href='http://mitcho.com/blog/link/the-hit-list-better-software-through-less-ui/' rel='bookmark' title='The Hit List: Better Software Through Less UI'>The Hit List: Better Software Through Less UI</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><a href="http://mitcho.com/blog/wp-content/uploads/2010/04/linguistlist.png"><img src="http://mitcho.com/blog/wp-content/uploads/2010/04/linguistlist.png" alt="" title="linguistlist" width="268" height="99" class="alignright size-full wp-image-3556" /></a></p>

<p>Everyone I know in linguistics uses the <a href="http://linguistlist.org/">LINGUIST List</a> website to a greater or lesser degree. Linguist List began as a mailing list in the 90&#8217;s, with book, job, and dissertation announcements, call-for-papers, and general academic discussions.</p>

<p>Nowadays many people follow the various announcements on Linguist List using an RSS feed reader such as <a href="http://google.com/reader">Google Reader</a> or my personal favorite <a href="http://netnewswireapp.com/">NetNewsWire</a>.</p>

<p>Unfortunately, the Linguist List RSS feeds (at least recently) don&#8217;t include the full text of the articles and have a few other quirks as well. It&#8217;s often hard to judge based on the title whether it&#8217;s really something I&#8217;m interested or not, so I&#8217;ve spent a lot of time frustratedly opening any possibly interesting-looking entry in a separate NetNewsWire tab. Today I decided enough was enough: I just wrote a script which parses each of the Linguist List RSS feeds, finds the actual descriptions and interleaves them.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> It&#8217;s working remarkably well so far:</p>

<p><span id="more-3555"></span></p>

<p><a rel='imagebox' href="http://mitcho.com/blog/wp-content/uploads/2010/04/Screen-shot-2010-04-26-at-6.41.07-PM.png"><img src="http://mitcho.com/blog/wp-content/uploads/2010/04/Screen-shot-2010-04-26-at-6.41.07-PM.png" alt="" title="Screen shot 2010-04-26 at 6.41.07 PM" width="650" height="365" class="alignright size-full wp-image-3557" /></a></p>

<p>Here are all the RSS links for all of you to subscribe to. I plan on keeping this up and maintained for the foreseeable future (or, until Linguist List improves their own RSS feeds!) so feel free to subscribe to it. Please do let me know if you run into any issues or have a suggestion. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>

<table>
<tr><td><a href="http://feeds.feedburner.com/LinguistListMostRecent" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Most recent issues</a></td><td><a href="http://feeds.feedburner.com/LinguistListMostRecent"><img src="http://feeds.feedburner.com/~fc/LinguistListMostRecent?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListAll" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> All (announcements)</a></td><td><a href="http://feeds.feedburner.com/LinguistListAll"><img src="http://feeds.feedburner.com/~fc/LinguistListAll?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListBooks" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Book announcements</a></td><td><a href="http://feeds.feedburner.com/LinguistListBooks"><img src="http://feeds.feedburner.com/~fc/LinguistListBooks?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListCalls" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Call for papers</a></td><td><a href="http://feeds.feedburner.com/LinguistListCalls"><img src="http://feeds.feedburner.com/~fc/LinguistListCalls?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListConfs" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Conference announcements</a></td><td><a href="http://feeds.feedburner.com/LinguistListConfs"><img src="http://feeds.feedburner.com/~fc/LinguistListConfs?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListDisc" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Discussions</a></td><td><a href="http://feeds.feedburner.com/LinguistListDisc"><img src="http://feeds.feedburner.com/~fc/LinguistListDisc?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListDiss" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Dissertation announcements</a></td><td><a href="http://feeds.feedburner.com/LinguistListDiss"><img src="http://feeds.feedburner.com/~fc/LinguistListDiss?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListFYI" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> FYI</a></td><td><a href="http://feeds.feedburner.com/LinguistListFYI"><img src="http://feeds.feedburner.com/~fc/LinguistListFYI?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListJobs" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Job announcements</a></td><td><a href="http://feeds.feedburner.com/LinguistListJobs"><img src="http://feeds.feedburner.com/~fc/LinguistListJobs?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListMedia" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Topics in the Media</a></td><td><a href="http://feeds.feedburner.com/LinguistListMedia"><img src="http://feeds.feedburner.com/~fc/LinguistListMedia?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListQs" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Queries</a></td><td><a href="http://feeds.feedburner.com/LinguistListQs"><img src="http://feeds.feedburner.com/~fc/LinguistListQs?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListReview" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Book reviews</a></td><td><a href="http://feeds.feedburner.com/LinguistListReview"><img src="http://feeds.feedburner.com/~fc/LinguistListReview?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListSoftware" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Software announcements</a></td><td><a href="http://feeds.feedburner.com/LinguistListSoftware"><img src="http://feeds.feedburner.com/~fc/LinguistListSoftware?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListSum" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Summaries of Query Responses</a></td><td><a href="http://feeds.feedburner.com/LinguistListSum"><img src="http://feeds.feedburner.com/~fc/LinguistListSum?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
<tr><td><a href="http://feeds.feedburner.com/LinguistListSupport" type="application/rss+xml"><img src='http://feedburner.google.com/fb/lib/images/icons/feed-icon-12x12-orange.gif'/> Support for Students</a></td><td><a href="http://feeds.feedburner.com/LinguistListSupport"><img src="http://feeds.feedburner.com/~fc/LinguistListSupport?bg=99CCFF&#038;fg=444444&#038;anim=0" height="26" width="88" style="border:0" alt="" /></a></td></tr>
</table>

<p>You can find descriptions of the content of each of these feeds <a href="http://linguistlist.org/issues/rss/topics.cfm">on Linguist List&#8217;s RSS feeds page</a>.</p>

<p><a href="http://www.feedburner.com" target="_blank"><img src="http://www.feedburner.com/fb/images/pub/i_heart_fb.gif" alt="Powered by FeedBurner" style="border:0"/></a><a href="http://scripts.mit.edu/">
<img alt="powered by scripts.mit.edu"
src="http://scripts.mit.edu/media/powered_by.gif" /></a></p>

<p>Perhaps in the future I&#8217;ll do a &#8220;how to&#8221; post as well with the code I use to make this happen. I&#8217;ll note that it&#8217;s just 70 lines of PHP (including whitespace) and could no doubt be tightened up. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Veteran Linguist List RSS subscribers will also note that I&#8217;m adding the full title to the entry title for the Conferences and Calls lists as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/automating-the-linguists-job/' rel='bookmark' title='Automating the Linguist&#8217;s Job'>Automating the Linguist&#8217;s Job</a></li>
<li><a href='http://mitcho.com/blog/link/the-hit-list-better-software-through-less-ui/' rel='bookmark' title='The Hit List: Better Software Through Less UI'>The Hit List: Better Software Through Less UI</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/better-linguist-list-rss-feeds/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Spring is for Speaking: JSConf, WordCamp SF, IACL</title>
		<link>http://mitcho.com/blog/projects/spring-is-for-speaking/</link>
		<comments>http://mitcho.com/blog/projects/spring-is-for-speaking/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 04:37:04 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[Boston]]></category>
		<category><![CDATA[Chinese language]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[harvard]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Jetpack]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandarin]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[San Francisco]]></category>
		<category><![CDATA[talk]]></category>
		<category><![CDATA[Washington D.C.]]></category>
		<category><![CDATA[WordCamp]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[WordPress Planet]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=3448</guid>
		<description><![CDATA[I recently confirmed three different very exciting speaking gigs which I&#8217;ll be doing this spring: JSConf.us: I&#8217;ll be putting my Mozilla Jetpack Ambassador hat on to represent Mozilla Labs&#8217; Jetpack project at the premier Javascript conference in North America, JSConf.us, which this year will be April 17-18 in Washington D.C. and has a pirate theme.1 [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/life/wordcamp-boston-2010/' rel='bookmark' title='WordCamp Boston 2010'>WordCamp Boston 2010</a></li>
<li><a href='http://mitcho.com/blog/projects/jetpacking-in-boston/' rel='bookmark' title='Jetpacking in Boston'>Jetpacking in Boston</a></li>
<li><a href='http://mitcho.com/blog/projects/mashing-up-the-browser-in-maine/' rel='bookmark' title='Mashing up the browser in Maine'>Mashing up the browser in Maine</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I recently confirmed three different very exciting speaking gigs which I&#8217;ll be doing this spring:</p>

<p><span id="more-3448"></span></p>

<p><strong>JSConf.us</strong>:</p>

<p>I&#8217;ll be putting my Mozilla Jetpack Ambassador hat on to represent Mozilla Labs&#8217; <a href="https://jetpack.mozillalabs.com/">Jetpack project</a> at the premier Javascript conference in North America, <a href="http://jsconf.us/2010/">JSConf.us</a>, which this year will be April 17-18 in Washington D.C. and has a pirate theme.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> I&#8217;ll be giving a short talk in the main session and will also lead a hands-on Jetpack workshop in the hacker lounge. I&#8217;ve heard that JSConf is a lot of fun and I&#8217;m really looking forward to it! <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>

<p><strong>WordCamp San Francisco</strong>:</p>

<p>I&#8217;m honored to have been invited to give a talk at <a href="http://2010.sf.wordcamp.org/">WordCamp San Francisco 2010</a>. WordCamps are community-organized events for the <a href="http://wordpress.org">WordPress</a> community, and the San Francisco WordCamp is the original and biggest. WordCamp SF will be at the Mission Bay Conference Center on May 1st. <a href="https://2010.sf.wordcamp.org/tickets/">Tickets available</a>.</p>

<p>My talk is tentatively titled &#8220;Abstract Your Code.&#8221;<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> WordPress is a great platform to build amazing content-rich applications on, and many of us have written new functionality in the form of plugins. I hope to encourage developers to make their code more portable and reusable after the project is done—or, ideally, to even start with abstraction in mind—to add to the &#8220;life&#8221; of the code and to consider then open-sourcing that functionality.</p>

<p>Hope to see you there!</p>

<p><strong>International Association of Chinese Linguistics (IACL) 18</strong>:</p>

<p>Finally, I&#8217;m thrilled to say that I got a paper accepted to the <a href="http://www.fas.harvard.edu/~iacl18/Site/index.html">annual meeting of the International Association of Chinese Linguistics</a> which this year is at Harvard on May 20-22. IACL is <em>the</em> big conference for Chinese linguistics, with about <a href="http://www.fas.harvard.edu/~IACL18/AcceptList.pdf">180 papers presenting</a>. I&#8217;ll be presenting <em>Two</em> Only<em>s in Mandarin Chinese</em>, my recent work on the formal syntax/semantics of two <em>only</em> words in Chinese: <em>zhǐ</em> (只) and <em>éryǐ</em> (而已). I&#8217;ve put up <a href="http://mitcho.com/academic/handout-20100226.pdf">a handout</a> of some of this material in work-in-progress form which I recently presented at <a href="http://people.fas.harvard.edu/~nicolae/SNEWS_2010/Welcome.html">SNEWS</a>.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>I&#8217;ll <a href="http://beijinghuar.com">fit right in</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>Sexier title suggestions welcome.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/life/wordcamp-boston-2010/' rel='bookmark' title='WordCamp Boston 2010'>WordCamp Boston 2010</a></li>
<li><a href='http://mitcho.com/blog/projects/jetpacking-in-boston/' rel='bookmark' title='Jetpacking in Boston'>Jetpacking in Boston</a></li>
<li><a href='http://mitcho.com/blog/projects/mashing-up-the-browser-in-maine/' rel='bookmark' title='Mashing up the browser in Maine'>Mashing up the browser in Maine</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/spring-is-for-speaking/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Living in the Stata Center</title>
		<link>http://mitcho.com/blog/life/living-in-the-stata-center/</link>
		<comments>http://mitcho.com/blog/life/living-in-the-stata-center/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 18:49:22 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[MIT]]></category>
		<category><![CDATA[Stata Center]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2914</guid>
		<description><![CDATA[We&#8217;re now three weeks into the semester at MIT where I just started a PhD program in linguistics. The Linguistics and Philosophy department is housed in The Ray and Maria Stata Center, also known as building 32. It&#8217;s a Frank Gehry building and thus crazy looking.1 We see photographers on the street from time to [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/im-busy-to-die/' rel='bookmark' title='I&#8217;m Busy to Die'>I&#8217;m Busy to Die</a></li>
<li><a href='http://mitcho.com/blog/life/dinner-with-barack-and-hillary/' rel='bookmark' title='Dinner with Barack and Hillary'>Dinner with Barack and Hillary</a></li>
<li><a href='http://mitcho.com/blog/life/travel/linguistics-in-%e5%98%89%e7%be%a9/' rel='bookmark' title='Linguistics in 嘉義'>Linguistics in 嘉義</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re now three weeks into the semester at MIT where I just started a PhD program in <a href="http://web.mit.edu/linguistics/">linguistics</a>. The Linguistics and Philosophy department is housed in The Ray and Maria <a href="http://en.wikipedia.org/wiki/Stata Center">Stata Center</a>, also known as building 32. It&#8217;s a <a href="http://en.wikipedia.org/wiki/Frank Gehry">Frank Gehry</a> building and thus crazy looking.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p><zp:boston/statacenter/statacenter02.jpg><zp:boston/statacenter/statacenter09.jpg></p>

<p><span id="more-2914"></span>
We see photographers on the street from time to time taking pictures of the building. In fact, the first day I got to Boston this fall, I walked into my hotel room and there was a framed photo of Stata on the wall.</p>

<p><zp:boston/statacenter/statacenter15.jpg></p>

<p>The department basically occupies the eighth and ninth floors of the <a href="http://en.wikipedia.org/wiki/Alexander W. Dreyfoos, Jr.">Dreyfoos</a> tower side. There&#8217;s a nice common space with some red couches and a spiral staircase.</p>

<p><zp:boston/statacenter/statacenter07.jpg><zp:boston/statacenter/statacenter08.jpg></p>

<p>All of us first years share an office together on the ninth floor (32-D976). The room is elliptical, with our desks along the side with windows and the other &#8220;wall&#8221; being a series of wooden bookshelves. It&#8217;s great to share an office with the cohort, as we all take all the same classes the first year as well.</p>

<p><zp:boston/statacenter/statacenter04.jpg><zp:boston/statacenter/statacenter12.jpg><zp:boston/statacenter/statacenter01.jpg><zp:boston/statacenter/statacenter03.jpg></p>

<p>Here&#8217;s the room for &#8220;Meg Data&#8221; just across the hall from our office. I was really hoping it was a woman named &#8220;Meg Data&#8221;&#8230; marrying her would get me one step closer to becoming &#8220;Dr. Data.&#8221; Alas, &#8220;Meg Data&#8221; or &#8220;MEG Data&#8221; apparently is simply a label for some data store—possibly some servers. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>

<p><zp:boston/statacenter/statacenter06.jpg></p>

<p>The majority of the building is the <a href="http://www.csail.mit.edu/">Computer Science and Artificial Intelligence Lab (CSAIL)</a> so there are posters up for fun-looking talks that I&#8217;m too busy for.</p>

<p><zp:boston/statacenter/statacenter10.jpg></p>

<p>Here&#8217;s my desk (with <a href="http://foxkeh.jp">Foxkeh</a>!) and my new bike which I&#8217;ve been bringing up to the office. The final photo is a photo of me with an &#8220;I ♥ The Web&#8221; poster in commemoration of <a href="http://mozilla.org/onewebday">OneWebDay</a>.</p>

<p><zp:boston/statacenter/statacenter13.jpg><zp:boston/statacenter/statacenter05.jpg><zp:boston/statacenter/statacenter14.jpg></p>

<p>We&#8217;ve been getting pretty busy with the readings and work for our classes, as well as occasionally hanging out. I hope to write some more posts both linguistic and non-linguistic in the coming days on life at MIT.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>It also apparently has some structural problems; most notably <a href="http://tig.csail.mit.edu/leaks.pdf">leaks</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/im-busy-to-die/' rel='bookmark' title='I&#8217;m Busy to Die'>I&#8217;m Busy to Die</a></li>
<li><a href='http://mitcho.com/blog/life/dinner-with-barack-and-hillary/' rel='bookmark' title='Dinner with Barack and Hillary'>Dinner with Barack and Hillary</a></li>
<li><a href='http://mitcho.com/blog/life/travel/linguistics-in-%e5%98%89%e7%be%a9/' rel='bookmark' title='Linguistics in 嘉義'>Linguistics in 嘉義</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/life/living-in-the-stata-center/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Report from SIGIR Workshop on Information Access in a Multilingual World</title>
		<link>http://mitcho.com/blog/projects/report-from-sigir-workshop-on-information-access-in-a-multilingual-world/</link>
		<comments>http://mitcho.com/blog/projects/report-from-sigir-workshop-on-information-access-in-a-multilingual-world/#comments</comments>
		<pubDate>Fri, 24 Jul 2009 19:42:27 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[information access]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[SIGIR]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[workshop]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2547</guid>
		<description><![CDATA[Yesterday I participated in and presented at a workshop on Information Access in a Multilingual World at ACM SIGIR in Boston. The focus of the workshop was on cross-language information retrieval (CLIR). Cross-language information retrieval systems enable users to retrieve relevant information across different languages for a certain task or query. Even if you have [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/presenting-in-boston-at-sigir-workshop/' rel='bookmark' title='Presenting in Boston at SIGIR Workshop'>Presenting in Boston at SIGIR Workshop</a></li>
<li><a href='http://mitcho.com/blog/link/the-hit-list-better-software-through-less-ui/' rel='bookmark' title='The Hit List: Better Software Through Less UI'>The Hit List: Better Software Through Less UI</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Yesterday I participated in and presented at <a href="http://www.sics.se/events/clir2009">a workshop on Information Access in a Multilingual World</a> at <a href="http://sigir2009.org/">ACM SIGIR</a> in Boston. The focus of the workshop was on <a href="http://en.wikipedia.org/wiki/cross-language information retrieval">cross-language information retrieval</a> (CLIR). Cross-language information retrieval systems enable users to retrieve relevant information across different languages for a certain task or query. Even if you have a budget to translate some documents from a foreign language to your language, how do you find the relevant documents to translate in the first place if you don&#8217;t speak (or read) that source language? This is the type of problem that CLIR aims to solve.</p>

<p><span id="more-2547"></span></p>

<p>The keynote speaker was <a href="http://www.linkedin.com/in/ralfsteinberger">Ralf Steinberger</a> of the <a href="http://www.jrc.ec.europa.eu/">European Commission&#8217;s Joint Research Centre</a>, presenting the <a href="http://emm.jrc.it/overview.html">European Media Monitor</a> family of applications. EMM is a suite of different applications all based on a core platform which aggregates news stories from a variety of sources around the world in a few dozen different languages. The system uses various CLIR techniques to then cluster stories by event, regardless of language or source country. Large news agencies and organizations, as well as the European Commission itself, use the system to track upcoming news stories as well as health concerns. In the European Union which has over a dozen different &#8220;official&#8221; languages, there is a great need for this type of service. The applications are available to the public so I invite you to play around with them <a href="http://emm.jrc.it/overview.html">at the EMM site</a>.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>The workshop&#8217;s paper presentations were quite diverse and all interesting. <a href="http://storm.cis.fordham.edu/~filatova/">Elena Filatova</a> presented an interesting approach to measuring the &#8220;trustworthiness&#8221; of statements in Wikipedia entries, by comparing the overlap in content statements between different language entries (more overlap => more trustworthy) which can be used to create <a href="http://en.wikipedia.org/wiki/inverted pyramid">inverted pyramid</a> summaries. Perspectives from different use cases were also examined through presentations on news analysts and health organizations, patent searches, and medical information for personal use, as well as discussion of the need for CLIR for historical and religious texts.</p>

<p>One interesting thread throughout the day&#8217;s sessions was the issue of loanword processes and the possible use of romanization as an interlingua. Cross-language <a href="http://en.wikipedia.org/wiki/named entity recognition">named entity recognition</a> is a major problem in CLIR. Many novel words and names (common in news articles) go through processes of loanword adaptation and transliteration and are hard to identify and also are not in the systems&#8217; dictionaries. A few of the talks touched on these problems, including a great talk by Kashif Riaz of <a href="http://www.umn.edu">the U of M</a> on the salient (and great) differences between Hindi and Urdu. <a href="http://ucdata.berkeley.edu/gey.html">Frederic Gey</a> described a number of different approaches to comparing strings via a romanization interlingua.</p>

<p>Another interesting thread was the idea of the target user. The needs of CLIR applications can vary greatly depending on its use case and the users&#8217; savvy. The needs of a patent search office, where many professional searchers are already multilingual, is clearly different from the needs of an executive hoping to stay on top of the world news related to their organization. It was brought up, however, that in this age of open data and API&#8217;s, if a CLIR resource provides a good API, it need not necessarily supply the perfect interface for every type of user, as different third parties could also develop such targeted interfaces.</p>

<p>While most of the presentations and research interests in the room were on users accessing resources in various languages, I presented on <a href="http://ubiquity.mozilla.com">Ubiquity</a>. The talk was intended to highlight the idea that the opposite problem of users with different languages getting at the same kinds of information and getting equally powerful user experiences is also a different but worthwhile problem. Below are the slides from this talk. As a web archive of all the papers will be set up soon, I believe it&#8217;s safe to put up my paper as well, so please check it out. It&#8217;s a good short (four pages) overview of the innovative approaches we&#8217;ve taken to build a localizable natural language interface.</p>

<div style="width:650px;text-align:left" id="__ss_1766079"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/mitcho/ubiquity-designing-a-multilingual-natural-language-interface" title="Ubiquity: Designing a Multilingual Natural Language Interface">Ubiquity: Designing a Multilingual Natural Language Interface</a><object style="margin:0px" width="649" height="542"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirnovideo-090724143940-phpapp02&#038;rel=0&#038;stripped_title=ubiquity-designing-a-multilingual-natural-language-interface" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigirnovideo-090724143940-phpapp02&#038;rel=0&#038;stripped_title=ubiquity-designing-a-multilingual-natural-language-interface" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="649" height="542"></embed></object></div>

<div class="files">
<div class="file pdf">
<a href='http://mitcho.com/academic/erlewine-sigir.pdf'>&#8220;Ubiquity: Designing a Multilingual Natural Language Interface.&#8221;</a> Presented at <a href='http://www.sics.se/events/clir2009'>SIGIR Workshop on Information Access in a Multilingual World</a>, Boston, July 2009. <i>To appear in a proceedings.</i></li>
<span class="specs">240&#160;kb - pdf</span>
</div>
</div>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Unfortunately, there is no public API available for these resources. We asked. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> &#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/presenting-in-boston-at-sigir-workshop/' rel='bookmark' title='Presenting in Boston at SIGIR Workshop'>Presenting in Boston at SIGIR Workshop</a></li>
<li><a href='http://mitcho.com/blog/link/the-hit-list-better-software-through-less-ui/' rel='bookmark' title='The Hit List: Better Software Through Less UI'>The Hit List: Better Software Through Less UI</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/report-from-sigir-workshop-on-information-access-in-a-multilingual-world/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Ubiquity Localization: What&#8217;s New, What&#8217;s Next</title>
		<link>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 19:45:25 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[command]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[nountype]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[participate]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2432</guid>
		<description><![CDATA[Yesterday we released Ubiquity 0.5, a major update to the already popular Ubiquity platform. Among numerous other features, Ubiquity 0.5 includes the first fruit of months of research on building a multilingual parser and natural language interface. In this blog post I&#8217;ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Yesterday we <a href="https://labs.mozilla.com/2009/07/ubiquity-0-5/">released Ubiquity 0.5</a>, a major update to the already popular Ubiquity platform. Among <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.5_Release_Notes">numerous other features</a>, Ubiquity 0.5 includes the first fruit of <a href="http://mitcho.com/blog/tag/ubiquity/">months of research on building a multilingual parser and natural language interface</a>. In this blog post I&#8217;ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well as a quick roadmap of future considerations.</p>

<p>Of course, one of the best ways to learn about the new features is to experience them&#8230; try Ubiquity 0.5 now!</p>

<p><a href="https://ubiquity.mozilla.com/xpi/0.5/ubiquity-0.5.xpi" style="cursor:pointer;background: #01d835;border: 1px solid;border-color:#01d835 #4ece71 #4ece71 #01d835;-moz-border-radius:4px;padding:10px;text-transform:uppercase;font-size:1.3em;color:white;text-shadow:#1e792c 1px 1px 1px;">Install now!</a></p>

<p><span id="more-2432"></span></p>

<h3>Preface: What&#8217;s What</h3>

<p>To give users a completely localized experience, there are many different components that need to be made to work with different languages. In a single Ubiquity input, like</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">translate hello from English to Spanish</pre></td></tr></table></div>


<p>there are actually many different components that need to all be localized in order to comprehend the equivalent sentence in a different language. The diagram below will give you a sense for the different components that need to be localized: the parser, verbs, and nountypes.</p>

<table>
<tr><th>input:</th><td>translate</td><td>hello</td><td>from</td><td>English</td><td>to</td><td>Spanish</td></tr>
<tr><th>element type:</th><td>verb</td><td>free argument</td><td>delimiter</td><td>structured argument</td><td>delimiter</td><td>structured argument</td></tr>
<tr><th>component to localize:</th><td>verb name</td><td>&nbsp;</td><td>parser</td><td>nountype</td><td>parser</td><td>nountype</td></tr>
</table>

<h3>What&#8217;s New</h3>

<p>Ubiquity 0.5&#8217;s improved language support can be thought of as the product of two more or less orthogonal developments: the brand-new parser, Parser 2, as well as local command localization support.</p>

<h4>Parser 2</h4>

<p>Parser 2 (née <a href="http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/">Parser: The Next Generation</a>) is a completely new parser designed to support different languages easily. Taking a serious look at the similarities and differences between different languages, we created a universal <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">parser design</a> which takes a minimal set of settings for particular languages to &#8220;learn&#8221; that language&#8217;s grammar.</p>

<p>The key insight to Parser 2&#8217;s design is that, for the limited range of inputs Ubiquity should understand, languages deal with them in remarkably similar ways. The input we&#8217;re dealing with here are all commands or actions without quantification or negation. These are all comprised of a single verb and a series of arguments with certain markings to designate their roles in the sentence. For example, here&#8217;s our example Ubiquity input:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">translate hello from English to Spanish</pre></td></tr></table></div>


<p>In this example, &#8220;translate&#8221; is the verb, which we recognize by looking at our bank of known verbs, and the rest of the input can be split up into three different arguments: &#8220;hello,&#8221; &#8220;from English,&#8221; and &#8220;to Spanish.&#8221; Of these, the markers &#8220;from&#8221; and &#8220;to&#8221; tell us that &#8220;English&#8221; is a <em>source</em> of some sort and &#8220;Spanish&#8221; is a <em>goal</em>, while the unmarked &#8220;hello&#8221; is simply an <em>object</em>—the target of the action. By identifying arguments by these abstract <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles"><em>semantic roles</em></a>, we&#8217;re able to quickly identify different kinds of arguments in different languages. For example, the following is the exact same example but using the Japanese syntax and markers:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>2
</pre></td><td class="code"><pre class="en" style="font-family:monospace;">helloをEnglishからSpanishにtranslate</pre></td></tr></table></div>


<p>Ubiquity knows what the different markers mean in Japanese, like &#8220;を&#8221; > <code>object</code>, &#8220;から&#8221; > <code>source</code>, &#8220;に&#8221; > <code>goal</code>, and can easily interpret this to mean the exact same command as (1). With just a few lines of code, <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">you can teach</a> Ubiquity how to recognize these different semantic roles in your language. This innovation also means that Ubiquity commands can be <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">written once for one language and automatically used with another language&#8217;s parser</a>, bringing us half-way to the goal of command localization.</p>

<p>Note also that Japanese (as in example (2)) is verb-final and uses no spaces between words. We&#8217;ve tried to make Parser 2 itself agnostic towards these types of different ways in which languages vary.</p>

<p>Parser 2 also adds <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">better argument-first suggestions</a>, inspired by some <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">earlier thoughts on Ubiquity in Japanese</a>. Ubiquity will now start to parse arguments in the input even if a verb isn&#8217;t found, and suggest verbs based on that input. For example, if you enter &#8220;hello to Spanish,&#8221; it&#8217;ll recognize that you have an <em>object</em> of &#8220;hello&#8221; and a <em>goal</em> of &#8220;Spanish&#8221; which can be understood as a language name, so it&#8217;ll suggest the verb &#8220;translate.&#8221; This is the way it should be. <img src='http://mitcho.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>

<p><small>For more information and background, feel free to check out some of my previous blog posts <a href="http://mitcho.com/blog/tag/ubiquity+parser/">on the new parser</a> and <a href="http://mitcho.com/blog/tag/ubiquity+linguistics/">on the different linguistic considerations</a>. I also have a four-page academic paper giving an overview of some innovations in the parser—email me at <code>x@x.com</code> where <code>x=mitcho</code> if you&#8217;d like to get a copy.</small></p>

<h4>Internationalization of bundled commands</h4>

<p>The move to use <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Semantic_Roles">semantic roles</a> in the <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_Source_Tip_Author_Tutorial">new command API</a>, described above, means that the same Ubiquity command code can be used with inputs in different languages. Two things are left, then, to make a completely localized input work: (1) translation (localization) of different strings in the commands and (2) localization of the nountypes.</p>

<p>In Ubiquity 0.5, we built a localization infrastructure for commands (1, above) but have not yet tackled the nountypes (2). Ubiquity 0.5 uses the <a href="http://en.wikipedia.org/wiki/gettext">gettext</a> <code>po</code> (portable object) file format for localizations, which many localizers in the UNIX world are very familiar with. This <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/79c7cea117ad04bb#">choice of file format</a> potentially opens Ubiquity localization up to many who are new to localization or are unfamiliar with other Mozilla localization. Ubiquity is able to produce localization templates by itself and we also have <a href="http://geeksbynature.dk/?p=35">a great tool</a> to check the completeness of different localizations.</p>

<p>A huge caveat, however, is that this localization support currently only works with the commands bundled with Ubiquity itself.</p>

<h3>What&#8217;s Next</h3>

<p>We&#8217;re going to continue working to make Ubiquity <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for more users. The tasks we have ahead of us are the localization of nountypes and community commands.</p>

<h4>Nountype localization</h4>

<p>With the new semantic role argument specifications, command localization simply became a question of translating some strings, which many localizers are used to. After all, we want localizations to affect the <em>presentation</em> of commands, not the logic of the commands. When it comes to nountypes, however, it is quite possible that we would actually want the nountype localization to affect its behavior.</p>

<p>Consider, for example, an imaginary <code>day_of_the_week</code> nountype. In English, this nountype might accept or suggest strings like &#8220;Monday&#8221; or &#8220;Tuesday,&#8221; while a French localization would accept &#8220;lundi&#8221; or &#8220;mardi.&#8221; More complicated still, consider a <code>date</code> nountype. In English this nountype may have custom logic to parse strings like &#8220;June 1st&#8221; while another language may have to parse very different kinds of strings. These nountype localizations thus involve not just string translations, but actual changes in their <em>logic</em>, making the <code>po</code> format approach we took to command localization a poor fit.</p>

<p>Making nountypes localizable, however, will make Ubiquity significantly more &#8220;natural&#8221; for many users. In the coming weeks and months we&#8217;ll be discussing and debating different options to accomplish this.</p>

<h4>Community command localization</h4>

<p>Even though the file format and infrastructure for command localization itself has been fleshed out with Ubiquity 0.5, the distributed nature of all these community commands adds an additional complication. Do we want community command localizations to be completely distributed, or should they be centralized? If they&#8217;re distributed, how do you find them? These are the types of questions we&#8217;ll need to ask and answer. The ease of creating a new Ubiquity command and sharing it with the world is a huge asset of the platform, so we&#8217;ll definitely be thinking about how best to localize these community commands as well. In the next day or two I&#8217;ll be writing up a more detailed blog post on what we need from a good community command localization solution.</p>

<h3>Summary</h3>

<p>For the more visually inclined (including myself), here&#8217;s a handy diagram to summarize what components are localizable now, what will be in the future, and what this means for Ubiquity users of different languages.</p>

<table>
<tr><th rowspan='2'>localized components</th><th rowspan='2'>Japanese input that Ubiquity will understand</th><th colspan='2'>support coverage</th></tr>
<tr><th>for bundled commands</th><th>for community commands</th></tr>
<tr><th><i>no localization</i></th><td>translate hello from English to Spanish</td><th rowspan='3' style='background: #99ff99'>Ubiquity 0.5!</th><th rowspan='2' style='background: #99ff99'>Ubiquity 0.5!</td></tr>
<tr><th>parser</th><td>helloをEnglishからSpanishにtranslate</td></tr>
<tr><th>parser + verbs</th><td>helloをEnglishからSpanishに訳す</td><th rowspan='2' style='background: #f99'><i>the future</i></th></tr>
<tr><th>parser + verbs + nountypes</th><td>helloを英語からスペイン語に訳す</td><th rowspan='1' style='background: #f99'><i>the future</i></th></tr>
</table>

<h3>Get Involved</h3>

<p>Whether you&#8217;re a speaker of an as-yet unsupported language, a veteran localization contributor, or simply interested in seeing how we can offer this natural language interface to more languages and more users, there are lots of ways to get involved. If you have some JavaScript experience and want to teach Ubiquity your native languages&#8217; grammar, read our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">parser localization tutorial</a>. If you would like to contribute localizations for our built-in commands, there&#8217;s a <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.5_Command_Localization_Tutorial">command localization tutorial</a>. To discuss how best to localize nountypes and community commands, or to ask questions about or discuss command and parser localization, join us on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity-i18n mailing list</a>.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/localizing-commands-for-ubiquity-0-5/' rel='bookmark' title='Localizing Commands for Ubiquity 0.5'>Localizing Commands for Ubiquity 0.5</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/' rel='bookmark' title='Big Issues and Small Issues with Parser 2'>Big Issues and Small Issues with Parser 2</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-localization-whats-new-whats-next/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Ubiquity presentation at Tokyo 2.0</title>
		<link>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 09:54:13 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[bilingual]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[GoaP]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[screencast]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2203</guid>
		<description><![CDATA[This past Monday I presented at Tokyo 2.0, Japan&#8217;s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from goo Labs, Shinjyou Sunao of Knowledge Creation, developers of the Voice Delivery System API, and Chris Salzberg of Global [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/' rel='bookmark' title='Notes from BarCamp Tokyo 2009'>Notes from BarCamp Tokyo 2009</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/06/t2p01.png" alt="T2P0.PNG" border="0" width="211" height="120" /></p>

<p>This past Monday I presented at <a href="http://www.tokyo2point0.net/events/tokyo-20-25-the-web-language">Tokyo 2.0</a>, Japan&#8217;s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from <a href="http://labs.goo.ne.jp/intl/">goo Labs</a>, Shinjyou Sunao of <a href="http://www.knowlec.com/">Knowledge Creation</a>, developers of the <a href="http://www.vdsapi.ne.jp/">Voice Delivery System</a> API, and <a href="http://globalvoicesonline.org/author/chris-salzberg/">Chris Salzberg</a> of <a href="http://globalvoicesonline.org/">Global Voices Online</a> on community translation.</p>

<p>I just put together a video of my Ubiquity presentation, mixing <a href="http://www.ustream.tv/recorded/1625213">the audio recorded live</a> at the presentation together with a screencast of my slides for better visibility. The presentation is 10 minutes long and is bilingual, English and Japanese.</p>

<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5091071&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5091071&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/5091071">Ubiquity: Command the Web with Language 言葉で操作する Web</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p><span id="more-2203"></span>
The event also coincided with <a href="http://www.linkedin.com/in/davemcclure">Dave McClure&#8217;s</a> <a href="http://www.geeksonaplane.com/">Geeks on a Plane</a> Asia tour, attracting even more interest to the event. In the end it was the largest Tokyo 2.0 event ever.</p>

<p>As I <a href="http://twitter.com/mitchoyoshitaka/status/1980687478">leave Tokyo next month</a>, I&#8217;ll be sad to not be able to continue to be a part of Tokyo 2.0. I&#8217;ve met a lot of fascinating people and learned a lot at the monthly events. I&#8217;ll definitely make sure to schedule them in in my future travels back to Japan and I highly recommend any of you who travel to Tokyo do so as well.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/changes-to-ubiquity-parser-2-and-the-playpen/' rel='bookmark' title='Changes to Ubiquity Parser 2 and the Playpen'>Changes to Ubiquity Parser 2 and the Playpen</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/life/notes-from-barcamp-tokyo-2009/' rel='bookmark' title='Notes from BarCamp Tokyo 2009'>Notes from BarCamp Tokyo 2009</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-presentation-at-tokyo-20/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</title>
		<link>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/</link>
		<comments>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/#comments</comments>
		<pubDate>Mon, 11 May 2009 05:19:17 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[portmanteau]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2019</guid>
		<description><![CDATA[The problem: In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<h3>The problem:</h3>

<p>In many <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a>, prepositions and articles often form <a href="http://en.wikipedia.org/wiki/portmanteau">portmanteau morphs</a>, combining to form a single word.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup> Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau&#8217;ed prepositions and articles&#8230; I refer you to the <a href="http://en.wikipedia.org/wiki/Contraction (grammar)#Italian">contraction</a> article on Wikipedia.</p>

<p>As I <a href="http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/">noted a couple weeks ago</a>, however, some combinations do not form portmanteaus.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>

<p><span id="more-2019"></span>
<strong>French:</strong></p>

<ol>
<li>à + le > au</li>
<li>à + la > à la</li>
</ol>

<p>The problem with this is that if we use both <em>à</em> and <em>au</em> as delimiters, we may end up passing the definite article to the verb as part of the argument in some cases, but not in other cases.</p>

<ol>
<li>&#8220;<strong>à</strong> la table&#8221; = &#8220;<strong>to</strong> the table&#8221;</li>
<li>&#8220;<strong>au</strong> chat&#8221; = &#8220;<strong>to the</strong> cat&#8221;</li>
</ol>

<h3>The solution:</h3>

<p>The solution is a new step in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">the Parser 2 process</a> which normalizes the form of arguments. Each language&#8217;s parser can now optionally define a <code>normalizeArgument()</code> method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of <code>{prefix: '', newInput: '', suffix: ''}</code>. For example, if you feed &#8220;la table&#8221; to the French <code>normalizeArgument()</code>, it ought to return</p>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#123;</span>prefix<span style="color: #339933;">:</span> <span style="color: #3366CC;">'la '</span><span style="color: #339933;">,</span> newInput<span style="color: #339933;">:</span> <span style="color: #3366CC;">'table'</span><span style="color: #339933;">,</span> suffix<span style="color: #339933;">:</span> <span style="color: #3366CC;">''</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#93;</span></pre></div></div>


<p>If there are no possible normalizations, <code>normalizeArgument()</code> should simply return <code>[]</code>. Each alternative returned by <code>normalizeArgument()</code> is substituted into a copy of the possible parses just before nountype detection. The prefixes and suffixes are stored in the argument (as <code>inactivePrefix</code> and <code>inactiveSuffix</code>) so they can be incorporated into the suggestion display.</p>

<p>Here, for example, is how the inactive prefix &#8220;l&#8217;&#8221; is displayed in <a href="chrome://parser-demo/content/index.html">the parser demo</a>. This way the user is told that the &#8220;l&#8217;&#8221; prefix is being ignored, and the nountype detection and verb action can act on the argument &#8220;English&#8221;.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> (In the future, of course, we could teach this nountype to accept the Catalan &#8220;anglès&#8221;.)</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-1.png" alt="Picture 1.png" border="0" width="320" height="29" /></center></p>

<p>The easiest way to produce this output is to use the <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/String/match"><code>String.match()</code></a> method. For example <code>normalizeArgument()</code> code, I refer you to the <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/ca.js">Catalan</a> and <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/file/12f5d9abf011/ubiquity/modules/parser/new/fr.js">French</a> parser files.</p>

<p>I hope that this solution will help make Ubiquity with Parser 2 feel <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">more natural</a> for many romance languages.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>Thanks to <a href="http://people.ucsc.edu/~jpobrien/">Jeremy O&#8217;Brien</a> for helping me figure out how to refer to this phenomenon.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This also relates to the issue of <a href="http://ubiquity.mozilla.com/trac/ticket/671">parsing multi-word delimiters</a>, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>Thank you to contributor <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a> for our first attempt at a Catalan parser!&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/' rel='bookmark' title='Adding Your Language to Ubiquity Parser 2'>Adding Your Language to Ubiquity Parser 2</a></li>
<li><a href='http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/' rel='bookmark' title='Scoring and Ranking Suggestions'>Scoring and Ranking Suggestions</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>In Case of Case&#8230;</title>
		<link>http://mitcho.com/blog/projects/in-case-of-case/</link>
		<comments>http://mitcho.com/blog/projects/in-case-of-case/#comments</comments>
		<pubDate>Wed, 06 May 2009 09:54:53 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Arabic]]></category>
		<category><![CDATA[Basque]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Latin]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Polish]]></category>
		<category><![CDATA[Turkish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1994</guid>
		<description><![CDATA[A recently hot topic of discussion in the Ubiquity i18n realm has been how to deal with strongly case-marking languages. As we continue to make steady progress, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2. Introduction Grammatical case is a marking on nouns [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>A recently hot topic of discussion in the <a href="https://wiki.mozilla.org/Labs/Ubiquity/i18n">Ubiquity i18n</a> realm has been <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">how to deal with strongly case-marking languages</a>. As we continue to make <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/log?rev=new-parser">steady progress</a>, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2.</p>

<h3>Introduction</h3>

<p><a href="http://en.wikipedia.org/wiki/Grammatical case">Grammatical case</a> is a marking on nouns that express grammatical function. Not all languages exhibit case. In many of the Indo-European languages we hope to bring Ubiquity to, case is realized as a suffix.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Here&#8217;s a classic example of case from <a href="http://en.wikipedia.org/wiki/Latin">Latin</a>. (Line 2 is the gloss of 1, line 4 of 3.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="la" style="font-family:monospace;">canis      virum      momordit
dog=sg.NOM man=sg.ACC bite=3sg.perfect
vir        canem      momordit
man=sg.NOM dog=sg.ACC bite=3sg.perfect</pre></td></tr></table></div>


<p>Example (1) is &#8220;the man bit the dog,&#8221; while example (3) is &#8220;the dog bit the man.&#8221; The only difference, as you see in the gloss, is that the nouns <em>canis</em> and <em>vir</em> are marked with different case endings in the two sentences. By marking the nouns with different cases (here, <a href="http://en.wikipedia.org/wiki/nominative">nominative</a> and <a href="http://en.wikipedia.org/wiki/accusative">accusative</a>), their semantic roles in the sentence—which is the the biter and which is the bitee—can be identified unambiguously. (Their positions are also switched in these examples but in reality Latin has a very free word order—the same sentences with other word orders including OSV or VSO are also common.)</p>

<p>At first glance, strongly case-marked languages may look like a godsend for <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">identifying the semantic roles of arguments</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> If we can easily and unambiguously recognize arguments&#8217; cases to put them in their appropriate semantic roles, this could simplify processing as well as make Ubiquity input follow a <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">natural syntax</a> for such languages. Unfortunately, there are some significant challenges which must be overcome in order to make the processing of case-markers worthwhile.</p>

<p><span id="more-1994"></span></p>

<h3>The case against case</h3>

<p>There are broadly three different difficulties with dealing with strongly case-marking languages: (1) how to identify case correctly, (2) how to identify the boundaries of the arguments, and (3) what case to use when handing the arguments to the verb&#8217;s preview and execution.</p>

<h4>Parsing for case</h4>

<p>In some languages, it is very easy to recognize different case endings. For example, for Turkish it would be relatively easy to write a regular expression for each of the cases below, even with the <a href="http://en.wikipedia.org/wiki/vowel harmony">vowel harmony</a> as exhibited in the genitive and accusative cases between <em>i</em> and <em>ü</em>.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>

<table>
<tr>
<th>Case</th>
<th>Ending</th>
<th><i>köy</i> &#8220;village&#8221;</th>
<th>Meaning</th>
</tr>
<tr>
<td>Nominative</td>
<td>Ø (none)</td>
<td><i>köy</i></td>
<td>village</td>
</tr>
<tr>
<td>Genitive</td>
<td><i>-in</i></td>
<td><i>köyün</i></td>
<td>the village&#8217;s<br />
of the village</td>
</tr>
<tr>
<td>Dative</td>
<td><i>-e</i></td>
<td><i>köye</i></td>
<td>to the village</td>
</tr>
<tr>
<td>Accusative</td>
<td><i>-i</i></td>
<td><i>köyü</i></td>
<td>the village</td>
</tr>
<tr>
<td>Ablative</td>
<td><i>-den</i></td>
<td><i>köyden</i></td>
<td>from the village</td>
</tr>
<tr>
<td>Locative</td>
<td><i>-de</i></td>
<td><i>köyde</i></td>
<td>in the village</td>
</tr>
</table>

<p>(Example from <a href="http://en.wikipedia.org/wiki/Turkish language">Turkish language</a> on Wikipedia.)</p>

<p>However, in many other languages identifying case affixes can be quite difficult as they vary greatly depending on the root noun, not to mention irregular declensions. For example, in Polish the nominative <em>student</em> becomes the <em>studenta</em> in the accusative which may look like a simple suffix, but the nominative <em>pies</em> (&#8220;dog&#8221;) becomes <em>psa</em> while <em>stół</em> (&#8220;table&#8221;) remains unchanged.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> Writing rules for these differing (and sometimes not unambiguous) case-marking paradigms without building in lexical information would be very difficult indeed.</p>

<h4>Finding the edges</h4>

<p>Recall that the current <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2 design</a> identifies arguments by identifying known delimiters (most often some adposition) as a left or right edge of an argument. By not having to run the nountype detection over every substring of the input, we greatly reduce the processing time needed in each parse. This approach, however, relies on our being able to reliably identify some sort of boundary for each of our arguments.</p>

<p>In strongly case-marking languages, the case is realized on the noun itself, but this noun may be buried in the middle of the noun phrase. Even if we could reliably identify the case-marker, it would mark neither the left nor right edge of the argument, making our current parsing strategy worthless. For example, consider the following Arabic example of &#8220;the house of the man&#8221; in nominative and accusative cases:<sup id="fnref:6"><a href="#fn:6" rel="footnote">5</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="ar" style="font-family:monospace;">baytu 'r-rajuli
house=NOM of=man
bayta 'r-rajuli
house=ACC of=man</pre></td></tr></table></div>


<p>In these cases, we see that the only distinction between (5) (بَيتُ الرَّجُلِِ) and (7) (بَيتِ الرَّجُلِِ) is the case suffix on the head noun, &#8220;house,&#8221; which sits in the middle of the noun phrase. Even if we could properly identify this case ending, it would mark neither the left nor the right edge of the entire argument.</p>

<p>Contrast this with German where, even though arguments have case, the case is realized on the article, not on the noun head itself, so we can essentially deal with these articles as prepositions, using the article as the left edge of the argument.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="de" style="font-family:monospace;">den     großen Hund
the=ACC big    dog
dem     großen Hund
the=DAT big    dog</pre></td></tr></table></div>


<p>Believe it or not, things can actually get even worse than just not being able to find an edge of our arguments. The worst-case scenario comes from discontinuous constituents, in languages where case marking on both nouns and modifiers allow for very free word order. Latin is just such a language:<sup id="fnref:5"><a href="#fn:5" rel="footnote">6</a></sup></p>

<p>From M. Tullius Cicero, &#8220;Against Catiline,&#8221; chapter 1:<br/>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">quem        ad finem      sese effrenata                       iactabit         audacia?
what=sg.ACC to extent=ACC self unbridle=perf-past-part.3sg.NOM fling=future.3sg audacity=sg.NOM</pre></div></div>


<br/>
&#8220;To what extent will (your) unbridled audacity fling itself about?&#8221;
</p>

<p>In this example we see that <em>effrenata</em> is modifying <em>audacia</em> but the two do not form a unit in the linear order but their relationship can be recovered because both words carry the nominative case marking. While it would be unfair to expect Ubiquity to ever be able to properly parse such arguments, requiring a certain amount of discipline from the user, this is an illustration of how bad things could get if we took the processing of case-markers to the extreme.</p>

<h4>The proper case for execution</h4>

<p>The final difficulty in processing case-markings in Ubiquity comes from the preview and execution stages of a Ubiquity command&#8217;s usage. That is, after we parse the input, we must give the verb the arguments we found so that it can display a meaningful preview or behave correctly when executed. At this point, what case should the noun be when we hand the string of the argument to the verb?</p>

<p><a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="CAVE CANEM" target="_blank"><img src="http://farm1.static.flickr.com/120/275046371_9080289d04.jpg" alt="CAVE CANEM" border="0" /></a><br /><small><a href="http://creativecommons.org/licenses/by-sa/2.0/" title="Attribution-ShareAlike License" target="_blank"><img src="http://mitcho.com/blog/wp-content/plugins/photo-dropper/images/cc.png" alt="Creative Commons License" border="0" width="16" height="16" align="absmiddle" /></a> <a href="http://www.photodropper.com/photos/" target="_blank">photo</a> credit: <a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="Platinatore" target="_blank">Platinatore</a></small></p>

<p>For example, consider the Latin expression &#8220;cave canem!&#8221; meaning &#8220;beware the dog!&#8221;</p>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">cave              canem!
beware=imperative dog=sg.ACC</pre></div></div>


<p>Supposing for a moment that we&#8217;ve implemented the <em>cavere</em> (&#8220;beware&#8221;) verb in Ubiquity and properly parsed &#8220;cave canem,&#8221; should we pass the literal string &#8220;canem&#8221; in accusative case to the verb, or the nominative string &#8220;canis,&#8221; or the root &#8220;can-&#8220;? Which is more appropriate? If &#8220;canis&#8221; is the more appropriate choice, Ubiquity would then have to be responsible for declining the accusative into a nominative&#8230; for all case-marked languages. This is clearly a road we do not want to go down.</p>

<h3>Proposal: only support determiners and adpositions</h3>

<p>I&#8217;ve laid out three reasons why processing strongly case-marked languages in Ubiquity is a non-starter. Fortunately, languages often have multiple different strategies for accomplishing similar communicative tasks. One oft-used strategy for <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marking different roles of arguments</a> is the use of <strong>adpositions</strong> (a fancy term for prepositions and postpositions). Unlike case-markers which often are affixes on nouns, prepositions mark the beginning of an argument and postpositions the end, as is used in the current parsing strategy.</p>

<p>From a formal/theoretical perspective, adpositions sit above the noun phrase proper, while modifiers like adjectives live within the noun phrase. This reflects the fact that, with few exceptions, adpositions mark an edge of the noun phrase, which is crucial to our parsing strategy. (Here, PP is a prepositional phrase and NP is a noun phrase.) Note also that for languages such as German which marks case on determiners (D), the same logic holds.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/dcaa2cd9-4c7b-45fd-8a44-75c25b1b5561.jpg" alt="DCAA2CD9-4C7B-45FD-8A44-75C25B1B5561.jpg" border="0" width="126" height="106" style='vertical-align:middle;padding:5px;' /><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/936098e0-425b-43e1-8cec-d188d43cc942.jpg" alt="936098E0-425B-43E1-8CEC-D188D43CC942.jpg" border="0" width="170" height="134" style='vertical-align:middle;padding:5px;' /></p>

<p>Note also that, as long as the case-marking is phrase-marking  (i.e. marking the edge of the noun phrase) rather than just affixing to the head noun, our parsing strategy will work. This means we could possibly in the future write a simple RegExp to split off the Basque dative suffix, as it marks the end of the entire noun phrase. This can be seen in the following data from <a href="http://www.loria.fr/~tseng/Pubs/lsk04.pdf">Tseng (2004)</a>, where the suffix <em>-(r)i</em> affixes to the last word in the noun phrase, no matter the type of speech of that last word. (Basque is crazy cool!)</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-2.png" alt="Picture 2.png" border="0" width="539" height="66" /></p>

<h3>Conclusion</h3>

<p>In this blog I&#8217;ve outlined some reasons why it would be unreasonable or very difficult to incorporate case-marker processing into our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">current parser strategy</a>. The case markers themselves are often hard to identify, the case markers do not align at the edge of arguments, and there is the question of what form of the argument should be passed to the verb for preview and/or execution. Luckily many languages allow for adpositions (prepositions and postpositions) as an alternative strategy to case as a means of marking the different grammatical functions of arguments. By limiting Ubiquity parsing to adpositions (and case-marked determiners), I believe we are able to reach a good compromise between each user&#8217;s natural language and an easily machine-processable form.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Note that when linguists talk about &#8220;case,&#8221; they could be referring to two different (though related) concepts: case (lowercase) is the observed pattern of affixes on nouns which indicate grammatical function, while Case (uppercase) refers to a theoretical (formal) feature of syntactic objects—certain lexical items &#8220;assign Case&#8221; or &#8220;receive Case&#8221; and its mismatches were ruled out in <a href="http://en.wikipedia.org/wiki/Government and binding theory">GB</a> syntax by the Case Filter. You&#8217;ll find GB linguistics papers referring to &#8220;case&#8221; when discussing Mandarin Chinese, for example, a language that doesn&#8217;t have any overt case (lowercase) and you&#8217;ll know immediately that this usage is an uppercase Case case. In this blog post I&#8217;ll be dealing primarily with the former descriptive notion.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>When I refer to &#8220;strongly case-marking languages,&#8221; I am referring to languages with a non-trivial inventory of cases (not just nominative, accusative, and genitive) and where a noun phrase&#8217;s case is not reflected on <a href="http://en.wikipedia.org/wiki/determiner (class)">determiners</a>. For example, <a href="http://en.wikipedia.org/wiki/German language">German</a> is excluded by this definition as case is realized exclusively on articles and there is no need to find and parse the noun head itself to identify its case—more information on German is in the section &#8220;finding the edges.&#8221;&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>In reality Turkish case morphology does get a little more complicated than this with some consonants shifting as well, but it is still possible to <a href="http://www.sfs.uni-tuebingen.de/iscl/Theses/makedonski.pdf">identify Turkish case with regular expressions</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>For those of you who were curious, this difference in Polish is based on the differing genders of each of these words. Data from <a href="http://en.wikipedia.org/wiki/Polish language">Polish language</a> on Wikipedia.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:6">
<p>Example from <a href="http://en.wikipedia.org/wiki/Iʻrāb">Iʻrāb</a> on Wikipedia.&#160;<a href="#fnref:6" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:5">
<p>Thank you to <a href="http://bpick.tumblr.com/">Bailey Pickens</a> for help with the Latin data.&#160;<a href="#fnref:5" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/in-case-of-case/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Dates in the Month of May that Are of Interest to Linguists</title>
		<link>http://mitcho.com/blog/link/dates-in-the-month-of-may-that-are-of-interest-to-linguists/</link>
		<comments>http://mitcho.com/blog/link/dates-in-the-month-of-may-that-are-of-interest-to-linguists/#comments</comments>
		<pubDate>Fri, 01 May 2009 15:51:42 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[humor]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[May]]></category>
		<category><![CDATA[McCawley]]></category>
		<category><![CDATA[University of Chicago]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1971</guid>
		<description><![CDATA[Happy May! May, as you surely know, is an important season of celebration for linguists. Some of my favorite items are below. From Dates in the Month of May that Are of Interest to Linguists by the late James D. McCawley: May 6, 1939. The University of Chicago trades Leonard Bloomfield to Yale University for [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/link/jerry-sadocks-automodular-grammar-on-itunes/' rel='bookmark' title='Jerry Sadock&#8217;s Automodular Grammar on iTunes'>Jerry Sadock&#8217;s Automodular Grammar on iTunes</a></li>
<li><a href='http://mitcho.com/blog/life/bathroom-graffiti/' rel='bookmark' title='Bathroom Graffiti'>Bathroom Graffiti</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Happy May! May, as you surely know, is an important season of celebration for linguists. Some of my favorite items are below.</p>

<p>From <a href="http://specgram.com/LP/10.mccawley.may.html">Dates in the Month of May that Are of Interest to Linguists</a> by the late <a href="http://en.wikipedia.org/wiki/James D. McCawley">James D. McCawley</a>:</p>

<blockquote>
May 6, 1939. 
The University of Chicago trades Leonard Bloomfield to Yale University for two janitors and an undisclosed number of concrete gargoyles.
<br/><br/>
May 23, 38,471&#160;B.C.
God creates language.
<br/><br/>
May 29, 1962.
Angular brackets are discovered. Classes at M.I.T. are dismissed and much Latvian plum brandy is consumed.
<br/><br/>
May 31, 1951.
Chomsky discovers Affix-hopping and is reprimanded by his father for discovering rules on shabas.
</blockquote>

<p>Unfortunately May 31, 1951 was a Thursday&#8230;</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/link/jerry-sadocks-automodular-grammar-on-itunes/' rel='bookmark' title='Jerry Sadock&#8217;s Automodular Grammar on iTunes'>Jerry Sadock&#8217;s Automodular Grammar on iTunes</a></li>
<li><a href='http://mitcho.com/blog/life/bathroom-graffiti/' rel='bookmark' title='Bathroom Graffiti'>Bathroom Graffiti</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/link/dates-in-the-month-of-may-that-are-of-interest-to-linguists/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Adding Your Language to Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 11:44:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case marking]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[semantic roles]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1956</guid>
		<description><![CDATA[NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions. You&#8217;ve seen the video. You speak another language. And you&#8217;re wondering, &#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221; The answer: not that hard. With [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">these instructions</a>.</strong></p>

<p>You&#8217;ve <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">seen the video</a>. You speak another language. And you&#8217;re wondering, <strong>&#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221;</strong> The answer: <strong>not that hard.</strong> With a little bit of JavaScript and knowledge of and interest in your own language, you&#8217;ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please <a href="http://ubiquity.mozilla.com/trac/ticket/662">submit your (even incomplete) language files</a>!</p>

<p><em>As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the <a href="http://ubiquity.mozilla.com/planet/">Ubiquity Planet</a> and/or <a href="http://mitcho.com/blog/">this blog</a> (<a href="http://mitcho.com/blog/feed/blog-only/">RSS</a>).</em></p>

<p><span id="more-1956"></span></p>

<h3>Set up your environment</h3>

<p>If you&#8217;re new to Ubiquity core development, you&#8217;ll want to first read the <a href="http://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">Ubiquity 0.1 Development Tutorial</a> to learn how to get a live copy of the Ubiquity repository using <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>. Once you&#8217;ve set up your Firefox profile to use this development version, make sure to try changing the <code>extensions.ubiquity.parserVersion</code> value to 2 in <code>about:config</code> (as seen in <a href="(http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/)">this demo video</a>) to verify that Parser 2 is working for you.</p>

<p>As you read along, you may find it beneficial to follow along in the languages currently included in Parser 2: <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/en.js">English</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/pt.js">Portuguese</a>, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/sv.js">Swedish</a> (and the incomplete <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a> and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/fr.js">French</a>).</p>

<h3>The structure of the language file</h3>

<p>Each language in Parser 2 gets its own file which acts as a <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>. You&#8217;ll need to look up the <a href="http://en.wikipedia.org/wiki/List of ISO 639-1 codes">ISO 639-1 code for your language</a>&#8230; Here we&#8217;ll use English (code <code>en</code>) as an example here and the JavaScript language file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.</p>

<p>Here is the basic template for a Ubiquity Parser 2 language file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> EXPORTED_SYMBOLS <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;makeEnParser&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">typeof</span> window<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #3366CC;">'undefined'</span><span style="color: #009900;">&#41;</span> <span style="color: #006600; font-style: italic;">// kick it chrome style</span>
  Components.<span style="color: #660066;">utils</span>.<span style="color: #003366; font-weight: bold;">import</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;resource://ubiquity/modules/parser/new/parser.js&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> makeEnParser<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> en <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Parser<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'en'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
...
&nbsp;
  <span style="color: #000066; font-weight: bold;">return</span> en<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>


<p>After lines 1-4 which set up the <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>, everything else is wrapped in a factory function called <code>makeLaParser</code> (for Latin) or <code>makeEnParser</code> (for English, <code>en</code>) or <code>makeFrParser</code> (for French, <code>fr</code>), etc. This function initializes the new <code>Parser</code> object (line 7) with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That&#8217;s it!</p>

<p>Now let&#8217;s walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.</p>

<h3>Identifying your branching parameter</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">branching</span> <span style="color: #339933;">=</span> <span style="color: #3366CC;">'right'</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// or 'left'</span></pre></div></div>


<p>One of the first things you&#8217;ll have to set for your parser is <strong>the <code>branching</code> parameter</strong>. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or &#8220;role marker&#8221; (most often, these are <a href="http://en.wikipedia.org/wiki/adposition">prepositions or postpositions</a>. For example, in English &#8220;from&#8221; is a delimiter for the <code>goal</code> role and its argument is on its right.</p>

<table>
<tr><td>&nbsp;</td><td>&nbsp;</td><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-right.png) no-repeat right bottom'>&nbsp;</td></tr>
<tr><td><b>to</b></td><td>Mary</td><td><b>from</b></td><td>John</td></tr>
</table>

<p>So &#8220;John&#8221; is a possible argument for the <code>source</code> role, but &#8220;Mary&#8221; should not be. Ubiquity can figure this out because English has the property <code>en.branching = 'right'</code>.</p>

<p>In Japanese, on the other hand, the argument of a delimiter like から (&#8220;from&#8221;) is found on the left of that delimiter, so <code>en.branching = 'left'</code>.</p>

<table>
<tr><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-left.png) no-repeat left bottom'>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>メアリー</td><td><b>-から</b></td><td>ジョン</td><td><b>-に</b></td></tr>
<tr><td>Mary</td><td><b>from</b></td><td>John</td><td><b>to</b></td></tr>
</table>

<p>In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.</p>

<p><strong>For more info</strong>:</p>

<ul>
<li>see <a href="http://en.wikipedia.org/wiki/Branching (linguistics)">branching</a> on Wikipedia.</li>
</ul>

<h3>Defining your roles</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">roles</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'goal'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'to'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'source'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'from'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'at'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'on'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'alias'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'as'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'using'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'with'</span><span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a> and a corresponding delimiter. Note that this mapping can be <a href="http://en.wikipedia.org/wiki/many-to-many (data model)">many-to-many</a>, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a>.</p>

<p><strong>For more info:</strong></p>

<ul>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the proposed inventory of semantic roles</a></li>
<li>Wikipedia entry on <a href="http://en.wikipedia.org/wiki/thematic relations">thematic relations</a></li>
</ul>

<h3>Entering your anaphora (&#8220;magic words&#8221;)</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">anaphora</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;this&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;that&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;it&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;selection&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;him&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;her&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;them&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The final required property is the <code>anaphora</code> property which takes a list of &#8220;magic words&#8221;. Currently there is no distinction between all the different <a href="http://en.wikipedia.org/wiki/deixis">deictic</a> <a href="http://en.wikipedia.org/wiki/anaphora (linguistics)">anaphora</a> which might refer to different things.</p>

<h3>Special cases</h3>

<p>Some special language features can be handled by overriding the default behavior from <code>Parser</code>. Many of these features are still in the works, however, so we&#8217;d love to get your comments!</p>

<h4>Languages with no spaces</h4>

<p>If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> function and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a> or <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a>.</p>

<h4>Case marking languages</h4>

<p><strike>If you have a strongly <a href="http://en.wikipedia.org/wiki/grammatical case">case-marked</a> language, you&#8217;ll have to write some rules to identify those different cases in <code>wordBreaker()</code> and then add some extra <code>roles</code> for these case markers, but for a number of languages the current design does not allow an elegant solution for parsing such arguments. Updates to this issue will be posted to <a href="http://ubiquity.mozilla.com/trac/ticket/663">this trac ticket</a>.</p>

<p>In the mean time, however, if you could write a parser even with only the prepositions/postpositions in your language, that would be a great benefit in getting started in your language.</strike> <strong>UPDATE</strong>: a proposal on how to deal with strongly case-marked languages has been written here: <a href="http://mitcho.com/blog/projects/in-case-of-case/">In Case of Case&#8230;</a>.</p>

<h4>Stripping articles</h4>

<p>Some languages have some delimiters which combine with articles. For example, in French, the preposition &#8220;à&#8221; combines with the masculine definite article &#8220;le&#8221; but not &#8220;la&#8221;:</p>

<ol>
<li>à + la = à la</li>
<li>à + le = au</li>
</ol>

<p>You can add both &#8220;à&#8221; and &#8220;au&#8221; as delimiters of the <code>goal</code> role, but then you will get feminine arguments back with the determiner (e.g. &#8220;la table&#8221;) while masculine arguments would be parsed without a determiner (e.g. &#8220;chat&#8221;).</p>

<ol>
<li>&#8220;<b>à</b> la table&#8221; = &#8220;<b>to</b> the table&#8221;</li>
<li>&#8220;<b>au</b> chat&#8221; = &#8220;<b>to the</b> cat&#8221;</li>
</ol>

<p><strike>One possible solution to this is to write a custom <code>cleanArgument()</code> method. After arguments have been parsed and placed in their appropriate roles, each argument text (say, &#8220;la table&#8221; or &#8220;chat&#8221;) are passed to <code>cleanArgument()</code>. You can simply write a <code>cleanArgument()</code> to strip off any &#8220;la &#8221; at the beginning of the input and return it and both example inputs will get normalized arguments: &#8220;table&#8221; and &#8220;chat&#8221;, respectively.</strike> <strong>UPDATE</strong>: For more up-to-date information on how to deal with these types of articles, please see <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">Solving a Romance Problem</a>.</p>

<h3>Test your parser</h3>

<p>Now you can go into <code>about:config</code> and change <code>extensions.ubiquity.language</code> to be your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> If you run into any trouble, feel free to ask for help on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity i18n listhost</a> or find me on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you&#8217;re at a good stopping point, please <a href="http://ubiquity.mozilla.com/trac/ticket/662">contribute your language file to Ubiquity</a>!</p>

<h3>More to come&#8230;</h3>

<p>At this point, you&#8217;ve only localized the <a href="http://en.wikipedia.org/wiki/argument structure">argument structure</a> of your language&#8230; additional work will be required to localize the nountypes and verb names, which is <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">the subject of ongoing discussion</a>&#8230; <a href="http://groups.google.com/group/ubiquity-i18n">join the Google Group</a> to get in on the discussion!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>At this point in time it&#8217;s also possible to test your parser at <code>chrome://parser-demo/content/index.html</code> if you make a couple other changes to your code&#8230; for more information, watch the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Foxkeh demos Ubiquity Parser TNG</a> video. This option gives you more debug info as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Attachment Ambiguity—or—when is the gyudon cheap?</title>
		<link>http://mitcho.com/blog/observation/attachment-ambiguity/</link>
		<comments>http://mitcho.com/blog/observation/attachment-ambiguity/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 06:17:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[observation]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[attachment ambiguity]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[Japanese culture]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[syntax]]></category>
		<category><![CDATA[Tokyo]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1815</guid>
		<description><![CDATA[Every day on the way to work I walk by a fine establishment known as Yoshinoya (吉野家), Japan&#8217;s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by gyudon, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya.jpg" alt="yoshinoya.jpg" border="0" width="650" height="328" /></p>

<p>Every day on the way to work I walk by a fine establishment known as <a href="http://en.wikipedia.org/wiki/Yoshinoya">Yoshinoya</a> (吉野家), Japan&#8217;s largest <em>gyudon</em> (牛丼) chain restaurant. For those of you whose lives have yet to be graced by <a href="http://en.wikipedia.org/wiki/gyudon">gyudon</a>, it&#8217;s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.</p>

<p>Part of this sign, though, made me think about our <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">new Ubiquity parser</a>. In particular, it was the <strong>attachment ambiguity</strong> in the end date of the promotion. The text in the photo above literally is &#8220;April 15th (Wed.) 8PM until&#8221;. (Note that Japanese is a strongly head-final language, and that the &#8220;until&#8221; is a postposition.) There are two possible readings for this expression, as illustrated by the two <a href="http://en.wikipedia.org/wiki/principle of compositionality">composition</a> trees below.</p>

<p><span id="more-1815"></span></p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/04/yoshinoya-trees.jpg" alt="yoshinoya-trees.jpg" border="0" width="658" height="157" /></center></p>

<p>The first tree, on the left, represents the reading &#8220;until (April 15th 8PM)&#8221;, while the second represents two arguments: &#8220;on April 15th&#8221; and &#8220;until 8PM&#8221;. In other words, in the first reading, the promotion begins at some earlier date and extends until April 15th at 8PM while, in the second reading, the promotion is one day only, on April 15th, until 8pm. Such syntactic ambiguities are called &#8220;attachment ambiguities&#8221; in linguistics as it is an ambiguity of where different arguments &#8220;attach&#8221; in a tree representation.</p>

<p>This attachment ambiguity was possible because there was no clear <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marker</a> on &#8220;April 15th,&#8221; which may have disambiguated it as &#8220;on April 15th&#8221;. In fact, in many languages this time position argument comes with no case marker or preposition, or it&#8217;s optional, making parsing for them difficult. If such a sentence is entered with spaces, the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Ubiquity Parser: The Next Generation</a> would try a parse where &#8220;8PM&#8221; is the &#8220;until&#8221; or <code>goal</code> argument and &#8220;April 15th&#8221; is an <code>object</code> argument, but it will only check its noun type, not put it in <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the correct semantic role</a> (<code>position</code>). Perhaps this is something to think about in the future.</p>

<p>These types of situations will surely come up as we continue work on the Ubiquity parser, making it essential to look at different languages. <strong>Are there certain kinds of arguments in your language that do not have any word-external markers such as case or prepositions/postpositions?</strong></p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/user-aided-disambiguation-a-demo/' rel='bookmark' title='User-Aided Disambiguation: a demo'>User-Aided Disambiguation: a demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/talking-ubiquity-in-japan-%e6%8b%a1%e5%bc%b5%e6%a9%9f%e8%83%bd%e5%8b%89%e5%bc%b7%e4%bc%9a%e3%81%ab%e3%81%a6%e7%99%ba%e8%a1%a8/' rel='bookmark' title='Talking Ubiquity in Japan: 拡張機能勉強会にて発表'>Talking Ubiquity in Japan: 拡張機能勉強会にて発表</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/attachment-ambiguity/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Scoring and Ranking Suggestions</title>
		<link>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/</link>
		<comments>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 07:17:26 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[candidates]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[constraints]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[Optimality Theory]]></category>
		<category><![CDATA[order]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ranking]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[suggestions]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1745</guid>
		<description><![CDATA[I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I&#8217;d put some of these thoughts down in writing. The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> so I thought I&#8217;d put some of these thoughts down in writing.</p>

<p>The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four &#8220;scores&#8221; for each suggestion:</p>

<p><span id="more-1745"></span></p>

<ol>
<li><code>duplicateDefaultMatchScore</code>: 100 by default—lowered if an unused argument gets multiple suggestions (in <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/file/0aaeae361c33/ubiquity/modules/parser/parser.js#l558">the words of the code</a>: &#8220;reduce the match score so that multiple entries with the same verb are only shown if there are no other verbs.&#8221;)</li>
<li><code>frequencyMatchScore</code>: a score from the <code>suggestion memory</code> of the frequency of the suggestion&#8217;s verb, given the input verb (currently the first word) or nothing, in the case of noun-first suggestions</li>
<li><code>verbMatchScore</code>: float in [0,1]: (as described <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_Documentation#Scoring_the_Quality_of_the_Verb_Match">here</a>)

<ul>
<li>0.75 is returned in case there it is a noun-first suggestion (by virtue of the fact that <code>String.indexOf('')==0</code>)</li>
<li>1 if the verb name is equivalent across input-output</li>
<li>in [0.75,1) if the input is a prefix of the suggestion verb name</li>
<li>in [0.5,0.75) if the input is a non-prefix substring of the suggestion verb</li>
<li>in [0.25,0.5] if the input is a prefix of one of the <code>synonyms</code></li>
<li>in [0,0.25) if the input is a non-prefix substring of one of the <code>synonyms</code></li>
</ul></li>
<li><code>argMatchScore</code>: the number of arguments with matching &#8220;specific&#8221; nountypes, where &#8220;specific&#8221; is designated by the nountype having property <code>rankLast=false</code>.</li>
</ol>

<p>With the numeric scores for each of these criteria, a partial order of suggestions is constructed using a <a href="http://en.wikipedia.org/wiki/lexicographic order">lexicographic order</a>: that is, compare candidates first using <code>duplicateDefaultMatchScore</code>, break ties using <code>frequencyMatchScore</code>, if still tied break using <code>verbMatchScore</code>, and if still tied break using <code>argMatchScore</code>. This paradigm of constraints is called &#8220;strictly ranked&#8221; and a corollary of this is that lower constraints, no matter how well you score on them, can never overcome a loss at a higher constraint. A crucial corollary of this system is that lower constraints&#8217; scores need not be computed if a higher constraint already dooms it to a lower position.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<h3>Ranking in The Next Generation</h3>

<p>One of the goals of <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> is to make noun/argument-first input first-class citizens of Ubiquity, improving their suggestions in particular to the benefit of <a href="http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/">verb-final languages</a>. Arguments will be split up and tested against different noun types before a verb is even entered into the input, in which case target verbs can be ranked according to the appropriateness of the input&#8217;s arguments. As such, I believe the <code>argMatchScore</code> criteria above should either be ranked higher in a strictly ranked model or be allowed to overtake lower scores for the higher constraints in a non-strictly ranked model.</p>

<p>The <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">Parser The Next Generation</a> proposal and <a href="http://mitcho.com/code/ubiquity/parser-demo">demo</a> currently orders using a product of various criteria&#8217;s scores, rather than a lexicographic order of strictly ranked constraints. The component factors are:</p>

<ol>
<li><code>0.5</code> for parses where the verb was suggested</li>
<li><code>0.5</code> for each extra (>1) <code>object</code> argument (essentially &#8220;unused words&#8221; in the previous parser)</li>
<li>the score of each argument against that semantic role&#8217;s target noun type</li>
<li><code>0.8</code> for each unset argument of that verb</li>
</ol>

<p>Each component score is a value in [0,1], so the score is always non-decreasing across the derivation. This offers a natural way to optimize the candidate set creation: if a possible parse ever gets a score below a magic &#8220;threshold&#8221; value, it is immediately thrown away.</p>

<p>A possible problem with the current Parser TNG scoring model is that it will implicitly hinder verbs and parses with more arguments as it could have more sub-1 noun type score factors—this consideration may be great enough that a weighted additive model should be considered over a multiplicative one.</p>

<p><strong>How do you think we can make Ubiquity&#8217;s suggestion ranking smarter? What other factors should be considered, and what factors could be left out?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>For all the linguists in the audience, if this sounds like <a href="http://en.wikipedia.org/wiki/Optimality Theory">Optimality Theory</a>, you would be right—there&#8217;s a little bit of <a href="http://roa.rutgers.edu/view.php3?roa=537">Prince and Smolensky (1993)</a> hanging out <a href="http://ubiquity.mozilla.com">in your browser</a>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/scoring-and-ranking-suggestions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Where&#8217;s The Verb?</title>
		<link>http://mitcho.com/blog/observation/wheres-the-verb/</link>
		<comments>http://mitcho.com/blog/observation/wheres-the-verb/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 07:10:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[commands]]></category>
		<category><![CDATA[infinitive]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[subjunctive]]></category>
		<category><![CDATA[typology]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verb-final]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1647</guid>
		<description><![CDATA[Ubiquity&#8217;s proposed new parser design is based on a principles and parameters philosophy: we can build an underlying universal parser and, for each individual language, we simply set some &#8220;parameters&#8221; to tell the parser how to act. As we consider the design&#8217;s pros and cons, it&#8217;s important to reflect back on the linguistic data and [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Ubiquity&#8217;s <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">proposed new parser design</a> is based on a <a href="http://en.wikipedia.org/wiki/principles and parameters">principles and parameters</a> philosophy: we can build an underlying universal parser and, for each individual language, we simply set some &#8220;parameters&#8221; to tell the parser how to act. As we consider the design&#8217;s pros and cons, it&#8217;s important to reflect back on the linguistic data and see if this architecture can adequately handle the range of linguistic data attested in our languages.</p>

<p>Today I&#8217;ll examine highlight some disparate typological data to help us understand these questions: <strong>where&#8217;s the verb?</strong> and <strong>what does the verb look like?</strong>
<span id="more-1647"></span>
There are broadly three different verb forms taken in commands in different languages:<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<ol>
<li>the <a href="http://en.wikipedia.org/wiki/infinitive">infinitive</a>,</li>
<li><a href="http://en.wikipedia.org/wiki/subjunctive mood">subjunctive mood</a>, or</li>
<li>a special verb form such as <a href="http://en.wikipedia.org/wiki/imperative">imperative</a>, <a href="http://en.wikipedia.org/wiki/participial">participial</a>, or conjunctive (such as Japanese <a href="http://en.wikipedia.org/wiki/Japanese verb conjugations#Te_form">て form</a>)</li>
</ol>

<p>Let&#8217;s give an example of each:</p>

<p><strong>Infinitive</strong> (English):<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">Hit me!</pre></td></tr></table></div>


<p><strong>Subjunctive mood</strong> (Modern Greek): &#8220;Eat it all!&#8221;</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>2
3
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">Na   to fas olo!
SUBJ it eat all</pre></td></tr></table></div>


<p><strong>Imperative form</strong> (French): &#8220;Eat it!&#8221;</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>4
5
</pre></td><td class="code"><pre class="french" style="font-family:monospace;">Mange   -le!
eat.IMP it</pre></td></tr></table></div>


<p>It&#8217;s important to note that some languages have <em>multiple forms available</em> for the same command. For example:</p>

<p><strong>Dutch</strong>: three ways to say &#8220;watch out!&#8221; with the same verb</p>

<ol>
<li>Infinitive: <code>Oppassen!</code></li>
<li>Imperative: <code>Pas op!</code></li>
<li>Participial: <code>Opgepast!</code></li>
</ol>

<p>Similarly, I received <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/#comment-974">a great comment by PhiliKON</a> on German and <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/#comment-980">associated data by Robert Kaiser</a> on my blog post yesterday:</p>

<p><strong>German</strong>: &#8220;search hello with google&#8221;</p>

<ol>
<li>Infinitive: <code>hello mit google suchen</code></li>
<li>Imperative: <code>suche hello mit google</code></li>
</ol>

<p>In addition, German and Dutch are interesting as they are <a href="http://en.wikipedia.org/wiki/V2 word order">verb second (V2)</a> languages, so the verb may surface at the beginning or the end of the sentence, depending on the form.</p>

<p>The <a href="https://wiki.mozilla.org/User:Mitcho/ParserTNG">new parser design</a> (which <a href="http://mitcho.com/code/ubiquity/parser-demo/">you can demo</a>) assumes for simplicity that the verb should be found at the beginning or the end of the input, which is consistent with the data I&#8217;ve seen (modulo <a href="http://en.wikipedia.org/wiki/Clitic#Clitics_in_Romance_languages">clitics</a>). Multiple verb forms could be accounted for by supporting &#8220;synonyms&#8221; of the verbs.</p>

<p><strong>What are the different ways verbs are expressed in commands in your language? Is the verb always found at the beginning or the end of the sentence? Is it ever somewhere in the middle?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Some of the data and theoretical support for this section comes from, among other sources, Sabine Iatridou&#8217;s <a href="http://web.mit.edu/linguistics/people/faculty/iatridou/de_modo_imperativo.pdf">De Modo Imperativo</a> lecture notes.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>Many refer to this in English as an &#8220;imperative form,&#8221; but in Modern English this is arguably the same as the infinitive.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/wheres-the-verb/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Automating the Linguist&#8217;s Job</title>
		<link>http://mitcho.com/blog/projects/automating-the-linguists-job/</link>
		<comments>http://mitcho.com/blog/projects/automating-the-linguists-job/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 08:57:58 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[analogy]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[deduction]]></category>
		<category><![CDATA[Dutch]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[patterns]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1634</guid>
		<description><![CDATA[At the end of my blog post yesterday I hinted at an exciting possible approach to Ubiquity&#8217;s localization: In the future we ideally could build a web-based system to collect these &#8220;utterances.&#8221; We could &#8230; generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>At the end of <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/">my blog post yesterday</a> I hinted at an exciting possible approach to Ubiquity&#8217;s localization:</p>

<blockquote>
  <p>In the future we ideally could build a web-based system to collect these &#8220;utterances.&#8221; We could &#8230; generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.</p>
</blockquote>

<p>If we build this type of &#8220;command-bank&#8221; of common Ubiquity input translated into various languages, we could build a tool to learn various features of each language and generate each parser, essentially <em>learning the language based on data</em>. Today I&#8217;ll elaborate on how I believe this could be possible, by analogy to another language learning device: <strong>the human</strong>.</p>

<p><span id="more-1634"></span></p>

<h3>Step 1: learning words</h3>

<p>How does a human learn language? Without getting into any <a href="http://en.wikipedia.org/wiki/language acquisition">details or theory</a>, we can say that the input for a language learner is always a combination of <em>linguistic input and a referent</em>. In the case of a child, this could be a pairing of linguistic input with <em>real world stimulus</em>:</p>

<p><center></p>

<table style='border:none;'><tr><th>input</th><th>referent</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;text-align:center;'>“taiyaki!”</td><td><img src='http://farm4.static.flickr.com/3543/3357452751_977fcce70c.jpg?v=0' width='300'/><br/>
by <a href='http://www.flickr.com/photos/makitani/3357452751/'>makitani</a> via <a href='http://creativecommons.org'>creative commons</a>.</td></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;width:50%;text-align:center;'>“cat!”</td><td><img src='http://farm4.static.flickr.com/3285/2387513295_2768ddf662.jpg?v=0' width='300'/><br />
by <a href='http://www.flickr.com/photos/victoriachan/2387513295/in/set-72157604986983169/'>victoriachan</a> via <a href='http://creativecommons.org'>creative commons</a>.</td></tr>
</table>

<p></center></p>

<p>The human child will hear &#8220;cat&#8221; while looking at the cat and, with time and repetition, learn that that thing is called a &#8220;cat,&#8221; and <a href="http://en.wikipedia.org/wiki/taiyaki">some other thing</a> is called &#8220;taiyaki.&#8221;</p>

<p>Similarly, we could take single-verb data points from our command-bank to match new words with a know referent—in this case, the base English string. Here&#8217;s an example from <a href="http://jan.moesen.nu/">Jan&#8217;s</a> comment on <a href="http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/">yesterday&#8217;s sample survey</a>.</p>

<p><center></p>

<table style='border:none;'><tr><th>input (Dutch)</th><th>referent (English)</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;text-align:center;'>zoek</td><td style='font-size:2em;color:blue;font-weight:bold;text-align:center;'>search</td></tr>
</table>

<p></center></p>

<h3>Step 2: deduction</h3>

<p>Now suppose we know some single words like &#8220;taiyaki&#8221; and &#8220;cat.&#8221; Consider the two situations. Given the first sentence and referent &#8220;mitcho&#8217;s eating a taiyaki,&#8221; the child could intuit the appropriate linguistic representation for the latter situation.</p>

<p><center></p>

<table style='border:none;'><tr><th>input</th><th>referent</th></tr>
<tr><td style='font-size:2em;color:orange;font-weight:bold;width:50%;text-align:center;'>“mitcho&#8217;s eating a taiyaki!”</td><td><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/eattaiyaki.jpg" alt="eattaiyaki.jpg" border="0" width="300" height="225" /></td></tr>
<tr><td style='font-size:2em;color:red;font-weight:bold;text-align:center;'>???</td><td><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/eatcat.jpg" alt="eatcat.jpg" border="0" width="300" height="225" /></td></tr>
</table>

<p></center></p>

<p>The process is simple. First note that there is only one variable changed between the two situations: the taiyaki has been replaced by a cat head. You can then construct the correct utterance <em>by analogy</em>, replacing &#8220;taiyaki&#8221; with &#8220;cat,&#8221; yielding &#8220;mitcho&#8217;s eating a cat!&#8221;<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup></p>

<p>Similarly, we could build a tool to analyze the data in a translated command-bank to identify particular features of each language, generating at least basic parsers for each language. Such a task would require a number of <em><a href="http://en.wikipedia.org/wiki/minimal pairs">minimal pairs</a></em> in our data set—here&#8217;s one such example from yesterday&#8217;s survey (with Dutch data from <a href="http://jan.moesen.nu/">Jan</a>):</p>

<p><center></p>

<table style='border:none;'><tr><th>input (Dutch)</th><th>referent (English)</th></tr>
<tr><td style='font-size:1.5em;color:orange;font-weight:bold;text-align:center;'>zoek HELLO met Google</td><td>
<span style='font-size:1.5em;color:blue;font-weight:bold;'>search HELLO with Google</span><br/>
<code>
<pre>Parse {
  verb:      'search',
  arguments: {
    object:  ['HELLO'],
    service: 'Google'
  }
}</pre>
</code></td></tr>
<tr><td style='font-size:1.5em;color:orange;font-weight:bold;text-align:center;'>zoek dit met Google</td><td>
<span style='font-size:1.5em;color:blue;font-weight:bold;'>search this with Google</span><br/>
<code>
<pre>Parse {
  verb:      'search',
  arguments: {
    object:  ['this'],
    service: 'Google'
  }
}</pre>
</code></td></tr></table>

<p></center></p>

<p>A simple string analysis<sup id="fnref:3"><a href="#fn:3" rel="footnote">2</a></sup> would tell us that the text <code>HELLO</code> was replaced by <code>dit</code> in the latter Dutch sentence. Meanwhile, since the English reference sentence is chosen manually, we also know the appropriate parses for each of those sentences. An object difference operation would note that the <code>object</code> property was changed from a value of <code>'HELLO'</code> to <code>'this'</code>. We could then map <code>dit</code> to the English <code>this</code>. We&#8217;ve now learned one (of perhaps many) Dutch deictic pronouns (aka &#8220;magic words&#8221;).</p>

<p>Given <a href="http://mitcho.com/code/ubiquity/parser-demo/">an adequately universal but customizable parser design</a>, we can then develop tests for various parameters by constructing appropriate <a href="http://en.wikipedia.org/wiki/minimal pairs">minimal pairs</a> in the base sentences and having them translated.<sup id="fnref:1"><a href="#fn:1" rel="footnote">3</a></sup> As noted yesterday, such a system could reduce the laborious task of writing individual parsers to a task of string translation, which <a href="https://wiki.mozilla.org/L10n:Home_Page">our community does exceedingly well</a>. <strong>I&#8217;m eager to hear what others think of this approach. What concerns would you have for this approach? What potential benefits do you see?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>I mean no offense to human children with this simplified example. Surely you can learn more than just string replacements.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>I started building some string analysis toys in JavaScript today, such as a <a href="http://mitcho.com/code/ubiquity/levenshtein/">Levenshtein difference demo</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>The linguists in the audience may note that this parser&#8217;s modular design is indeed in the spirt of the <a href="http://en.wikipedia.org/wiki/principles and parameters">principles and parameters</a> framework.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/' rel='bookmark' title='Ubiquity i18n: questions to ask'>Ubiquity i18n: questions to ask</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/automating-the-linguists-job/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ubiquity i18n: questions to ask</title>
		<link>http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 10:13:37 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[collaboration]]></category>
		<category><![CDATA[commands]]></category>
		<category><![CDATA[contribute]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[survey]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1611</guid>
		<description><![CDATA[I recently have traveled a fair deal and have met many people excited about the Ubiquity project and its localization efforts. &#8220;I want to help,&#8221; say the people, but many are unsure where to start. As a linguist, studying a language involves looking at instances of that language as data. To this end, we as [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I recently have traveled a fair deal and have met many people excited about <a href="http://ubiquity.mozilla.com">the Ubiquity project</a> and <a href="https://wiki.mozilla.org/Labs/Ubiquity/i18n">its localization efforts</a>. &#8220;I want to help,&#8221; say the people, but many are unsure where to start.</p>

<p>As a linguist, studying a language involves looking at instances of that language as data. To this end, we as Ubiquity internationalizers need to get at some examples of <em>target utterances</em>. Here&#8217;s an example survey which could be a good starting point for native speakers who want to contribute information on their language, based on <a href="https://wiki.mozilla.org/Taskfox/Verbs">Blair&#8217;s list of common Ubiquity verbs</a>.</p>

<p><span id="more-1611"></span></p>

<hr style='border-top: 2px gray dashed; height: 0; color: white;'/>

<h2>A survey for Ubiquity localization</h2>

<h3>Instructions</h3>

<p>How would you express the following commands in your language? The words in CAPITAL LETTERS do not need to be translated. Feel free to give multiple possible answers for each command.</p>

<p>Try to express the same command rather than forcing a &#8220;literal translation&#8221;; for example, if there&#8217;s no &#8220;map&#8221; verb in your language, you could translate example (8) as <code>lookup a map of PLACE</code>. Please keep in mind that the <a href="http://en.wikipedia.org/wiki/addressee">addressee</a> is a computer.</p>

<h3>Basic word order / argument structure</h3>

<ol>
<li><code>search HELLO</code></li>
<li><code>search HELLO with google</code></li>
<li><code>translate HELLO from English to French</code></li>
<li><code>lookup the weather for PLACE</code></li>
<li><code>shop for SHOES with Amazon</code></li>
<li><code>email HELLO to Bill</code></li>
<li><code>email HELLO to ADDRESS</code></li>
<li><code>map PLACE</code></li>
<li><code>find HELLO</code></li>
<li><code>tab to HELLO</code> or <code>switch to HELLO tab</code></li>
</ol>

<p>&#8230;</p>

<h3>Pronominal/deictic arguments (aka &#8220;magic words&#8221;)</h3>

<ol>
<li><code>search this with google</code></li>
<li><code>translate this to French</code></li>
<li><code>bookmark this tab</code></li>
</ol>

<p>&#8230;</p>

<hr style='border-top: 2px gray dashed; height: 0; color: white;'/>

<h3>How this data is used</h3>

<p>Responses to these surveys would be used to identify certain salient features of the language, such as <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">how the language codes for its arguments</a> (for example using <a href="http://en.wikipedia.org/wiki/adpositions">adpositions</a>, <a href="http://en.wikipedia.org/wiki/case marking">case marking</a>, or word order), whether the commands tend to be verb-inital or -final. Individual case markings, for example, can be identified by comparing <em><a href="http://en.wikipedia.org/wiki/minimal pairs">minimal pairs</a></em>—for example, by comparing item (1) and (2), we can learn how <code>google</code> in an instrumental role is marked, or by comparing example (2) and the &#8220;magic word&#8221; example (1), we can identify the appropriate &#8220;magic word&#8221; and determine whether the language uses any <a href="http://en.wikipedia.org/wiki/clitics">clitics</a> or not.</p>

<h3>Data collection</h3>

<p>In the future we ideally could build a web-based system to collect these &#8220;utterances.&#8221; We could also use such a system to automatically test our parsers in different languages against the sentences in the <em>command-bank,</em> or ultimately even generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-i18n-questions-to-ask/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Portuguese</title>
		<link>http://mitcho.com/blog/link/ubiquity-in-portuguese/</link>
		<comments>http://mitcho.com/blog/link/ubiquity-in-portuguese/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 07:49:33 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Portuguese]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1547</guid>
		<description><![CDATA[Felipe, a Ubiquity user, put together a wonderful look at what Ubiquity might look like in Portuguese. He has some great points here particularly regarding the &#8220;map&#8221; verb used in English—Felipe points out that Portuguese does not have a very common &#8220;map&#8221; verb and that it would be much more common to use enter me [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Felipe, a Ubiquity user, put together a wonderful look at <a href="http://felipe.wordpress.com/2009/03/03/thinking-ubiquity-in-portuguese/">what Ubiquity might look like in Portuguese</a>. He has some great points here particularly regarding the &#8220;map&#8221; verb used in English—Felipe points out that Portuguese does not have a very common &#8220;map&#8221; verb and that it would be much more common to use enter <code>me dê</code> (literally <em>me give</em>) to use a verb to <em>request</em> a map. This is a great example of how Jono&#8217;s <a href="http://jonoscript.wordpress.com/2009/01/24/overlord-verbs-a-proposal/">overlord verbs proposal</a> may be an important aspect of our i18n efforts. The post is also timely as we&#8217;ve recently been discussing in our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings">regular meetings</a> (open to all!) that Portuguese may/could be the focus of our next parser construction efforts.</p>

<p><strong>What would the challenges be for Ubiquity in your language?</strong> We&#8217;d love to see an increasing number of blog posts on this topic in different languages. Thanks Felipe! ^^</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/link/ubiquity-in-portuguese/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Unnatural by design</title>
		<link>http://mitcho.com/blog/observation/unnatural-by-design/</link>
		<comments>http://mitcho.com/blog/observation/unnatural-by-design/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 19:22:00 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[observation]]></category>
		<category><![CDATA[awkward]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[menu]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[photo]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1533</guid>
		<description><![CDATA[I&#8217;m flying over the pacific ocean right now but a little bit of language caught my eye. Here&#8217;s a picture of the menu for this flight, in three languages: English, Japanese, Chinese. What caught my eye is the line &#8220;served with ご一緒に 配,&#8221; meant to be read as part of &#8220;Beef in BBQ sauce&#8230; served [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m flying over the pacific ocean right now but a little bit of language caught my eye. Here&#8217;s a picture of the menu for this flight, in three languages: English, Japanese, Chinese.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/03/menu1.jpg" alt="menu.jpg" border="0" width="650" height="459" /></p>

<p>What caught my eye is the line &#8220;served with ご一緒に 配,&#8221; meant to be read as part of &#8220;Beef in BBQ sauce&#8230; <strong>served with</strong> Pepsi&#8230;&#8221;. The Chinese 配 (<em>pèi</em>) is fine here, meaning &#8220;with,&#8221; but the Japanese &#8220;ご一緒に&#8221; (<em>goissho-ni</em>) seemed awkward to me.</p>

<p><span id="more-1533"></span></p>

<p>The issue is that this adverbial meaning &#8220;together&#8221; normally comes <em>after</em> the &#8220;what it&#8217;s with&#8221; in an order like (1) (glossed in (2)):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="japanese" style="font-family:monospace;">A B-と       ご一緒に
A B-and/with together</pre></td></tr></table></div>


<p>In other words, where English and Chinese both would say &#8220;A with B&#8221;, it is most natural in Japanese to say the equivalent of &#8220;A B with (together)&#8221;.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> This is the reason why it seems unnatural to have anything between the &#8220;Beef in BBQ sauce&#8230;&#8221; line and &#8220;Pepsi&#8230;&#8221; line.</p>

<p>Looking at the rest of the menu, it&#8217;s clear that this isn&#8217;t a case where a native speaker wasn&#8217;t involved with the writing of the menu—the rest of the Japanese is perfect. <em>The Japanese modifier was inserted there just for the sake of parallel design, to the detriment of the text&#8217;s naturalness.</em> <strong>When have you seen design conflict with the structure of your language?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>This can be generalized to a certain extent by noting that English and Chinese are both <a href="http://en.wikipedia.org/wiki/head-initial">head-initial</a> (aka &#8220;right branching&#8221;) languages, while Japanese is strongly <a href="http://en.wikipedia.org/wiki/head-final">head-final</a> (aka &#8220;left branching&#8221;).&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/observation/unnatural-by-design/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Localizing Ubiquity: an open letter to linguists</title>
		<link>http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/</link>
		<comments>http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 03:26:47 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1512</guid>
		<description><![CDATA[Localizing Ubiquity: an open letter to linguists from mitcho on Vimeo. Below is a transcript of this video. Please distribute this video far and wide to anyone who may be interested. ^^ Dear linguist, My name is mitcho and I am currently involved with an exciting open source community project at Mozilla Labs called Ubiquity. [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><object width="649" height="365"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3390792&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3390792&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="649" height="365"></embed></object><br /><a href="http://vimeo.com/3390792">Localizing Ubiquity: an open letter to linguists</a> from <a href="http://vimeo.com/mitchoyoshitaka">mitcho</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<p><em>Below is a transcript of this video. Please distribute this video far and wide to anyone who may be interested. ^^</em></p>

<p><span id="more-1512"></span></p>

<p>Dear linguist,</p>

<p>My name is <a href="http://mitcho.com">mitcho</a> and I am currently involved with an exciting open source community project at <a href="http://labs.mozilla.com">Mozilla Labs</a> called <a href="http://ubiquity.mozilla.com">Ubiquity</a>. The project&#8217;s goal is to empower the user to accomplish more on the web, and to do this, Ubiquity connects disparate services from across the web with language. Let me show you what I mean.</p>

<h3>Example 1</h3>

<p>Suppose you&#8217;re emailing a friend to tell them where to meet later. It would be great if you could insert a map in your email, but you would regularly have to go to a maps application and insert a link.</p>

<p>But with Ubiquity installed, you can just pull up Ubiquity with a keystroke, enter <code>map Shinjuku</code>, pick the right result and rearrange it, and click &#8220;insert map.&#8221; It&#8217;s that simple.</p>

<h3>Example 2</h3>

<p>Let&#8217;s say you&#8217;ve stumbled upon <a href="http://www.aujourdhuilejapon.com/actualites-japon-en-recevant-aso-obama-a-rassure-le-japon-sur-ses-intentions-6191.asp">an article in French</a> and you want to send a bit of it to a friend. Again, you could just send a link, but with Ubiquity we can do much better than that. We&#8217;ll first select the quote we want and <code>translate it to English</code>. There we go. Now I&#8217;ll select the bit I actually want to send, pull up Ubiquity and execute <code>email this to Aza</code>. Because Ubiquity interacts with my address book, it knows who I mean by &#8220;Aza&#8221; and composes my email for me.</p>

<p>As you can see, the Ubiquity interface makes it easy to accomplish tasks with minimal interruption to a user&#8217;s workflow. New verbs are easy to write and there&#8217;s already an active community writing new verbs and sharing them with users.</p>

<h3>The localization of Ubiquity</h3>

<p>Right now Ubiquity offers a basic English parser and an experimental Japanese one, but Firefox itself currently ships with 55 distinct localizations. We hope to localize Ubiquity into a number of these languages.</p>

<p>In order to accomplish this, we need support from viewers like you.</p>

<p>For each language—whether Basque, Polish, or Esperanto—we hope to build a parser to take the command and pick out its verb and the arguments and then identify the semantic roles of each argument. Ubiquity will then apply the verb to those arguments and execute.</p>

<p>We&#8217;ll need both data and expertise into the languages we hope to localize. If you&#8217;re a native speaker or researcher, what is the structure of commands in your language? How do you code for different kinds of arguments? What do your pronouns look like? We&#8217;d love to see blog posts, discussions, or quick mockups on these and other topics.</p>

<p>For those of you with NLP experience, we&#8217;d love to get more folks involved in writing these parsers. And in the near future, we&#8217;d appreciate anyone who wants to simply try out Ubiquity in their language and give us feedback.</p>

<p>The idea of using natural language to direct computers is far from new. But in this limited context of individual commands—single imperative clauses without complexities such as negation or quantification—I believe there is great potential for Ubiquity to support other languages and bring this interface—this new way of interacting with the web—to more people in more communities.</p>

<p><a href="http://ubiquity.mozilla.com">ubiquity.mozilla.com</a>
<br />
bringing linguistics to a browser near you</p>

<p><a href="http://wiki.mozilla.org/Labs/Ubiquity/i18n">wiki.mozilla.org/Labs/Ubiquity/i18n</a>
<br />
help us localize the future</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Firefox: Focus on Japanese</title>
		<link>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/#comments</comments>
		<pubDate>Fri, 20 Feb 2009 11:08:14 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[mockup]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1466</guid>
		<description><![CDATA[One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I&#8217;m going to start exploring what [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/' rel='bookmark' title='How natural should a natural interface be?'>How natural should a natural interface be?</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>One of the eventual goals of the <a href="http://ubiquity.mozilla.com">Ubiquity project</a> is to bring some of its functionality and ideas to Firefox proper. To this end, <a href="http://azarask.in">Aza</a> has been exploring some possible options for what that would look like (<a href="http://www.azarask.in/blog/post/ubiquity-in-firefox-round-1/">round 1</a>, <a href="http://www.azarask.in/blog/post/ubiquity-in-the-firefox-round-2/">round 2</a>). All of his mockups, however, use English examples. I&#8217;m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let&#8217;s kick this off with my mother tongue, Japanese.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p><em>今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^</em> <strong>日本語でのコメントも大歓迎です！</strong></p>

<p><span id="more-1466"></span></p>

<h3>What commands look like in Japanese</h3>

<p>Japanese is not only just a verb-final language but it is strongly <a href="http://en.wikipedia.org/wiki/head-final">head-final</a>, meaning it has postpositions instead of prepositions, direct objects come before verbs, and adjectives precede nouns. In terms of <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">how it identifies its arguments</a>, every argument has a postposition/case marker (called a <em>particle</em> in the Japanese literature) which marks its role in the sentence.</p>

<p>A couple common particles we&#8217;ll look at in this example include -を (<em>-o</em>) which marks the direct object (accusative case, you might say) and -に (<em>-ni</em>) which acts like English &#8220;to&#8221; (dative case). The example sentence we&#8217;ll look at today is:</p>

<table border='0'>
<tr><td>ケーキを</td><td>ブレアに</td><td>送って</td><td>(ください)</td></tr>
<tr><td><em>kēki-o</em></td><td><em>burea-ni</em></td><td><em>okuʔte</em></td><td><em>kudasai</em></td></tr>
<tr><td>cake.ACC</td><td>Blair.DAT</em></td><td>send.IMP</td><td>&#8220;please&#8221;</td></tr>
<tr><td colspan='4'>&#8220;Please send a cake to Blair.&#8221;</td></tr>
</table>

<p>(Note: ʔ is a <a href="http://en.wikipedia.org/wiki/glottal stop">glottal stop</a>. ACC=accusative, DAT=dative, and IMP=imperative form.)</p>

<p>That final ください is often dropped in very casual speech and, as it adds no new information, we&#8217;ll assume today that the user will not enter it. Finally, Japanese doesn&#8217;t use spaces in their orthography, so the actual input would be &#8220;ケーキをブレアに送って&#8221;.</p>

<h3>Mockup 1: Particle identification</h3>

<p>One of the major hurdles in working with Japanese is that there are no spaces between the words. The natural first step is to split the sentence up into words, but this is a very difficult problem in <a href="http://en.wikipedia.org/wiki/Natural Language Processing">NLP</a> which <a href="http://research.microsoft.com/en-us/projects/japanesenlp/default.aspx">big name research groups</a> actively work on.</p>

<p>Fortunately, however, in <a href="http://www.azarask.in/blog/post/solving-the-it-problem/">&#8220;Solving the &#8216;It&#8217; Problem&#8221;</a> Aza suggests that, when we encounter ambiguity in our input, we can <em>go ask the user</em>. Great minds think alike, and computer scientist <a href="http://en.wikipedia.org/wiki/Jean E. Sammet">Jean E. Sammet</a> suggested the same idea <a href="http://doi.acm.org/10.1145/365230.365274">way back in 1953</a>:</p>

<blockquote>
  <p>Using English [or any other natural language] definitely involves the requirement for the computer (or more accurately its programming system) to query the user about any possible ambiguity.</p>
</blockquote>

<p>Parsing a sentence into words, in the limited context of Ubiquity, is really about identifying the particles which mark the end of each argument. Here&#8217;s a mockup of an application of the Sammet-Raskin Method to this problem:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/particle-id.png" alt="particle-id.png" border="0" /></center></p>

<p><strong>Pros:</strong> This completely takes care of the word-breaking problem, with minimal arbitration from the user. The parser knows <em>exactly</em> what arguments it&#8217;s dealing with and the visual feedback means the user won&#8217;t be surprised by the parse.</p>

<p><strong>Cons:</strong> Most of the particles/postpositions we&#8217;d have to deal with are a single character, so they may show up pretty often within words, in which case it would be quite annoying to have to press escape after each one.</p>

<p>An even smarter system, when wanting to mark a character as a particle, would first check to see that the argument (before the particle) is a valid argument type for that particle. If the check fails, it doesn&#8217;t have to bother with suggesting that character as a particle. This may cut down on the false positives.</p>

<h3>Smart suggestions: what works, what doesn&#8217;t</h3>

<p>One of the key suggestions in Aza&#8217;s mockups include a way to choose the prepositions while entering your arguments, based on the current verb.</p>

<p>For example, here, the <code>translate</code> command accepts a direct object, a <em>to</em>-object, and a <em>from</em>-object, so little <code>to</code> and <code>from</code> markers magically show up on the right side, making the appropriate prepositions (and by extension the appropriate arguments) discoverable. I think this line of thinking is a really good one, at least for English.</p>

<p><center><a class='limages' rel='lightbox[verbfinal]' href='http://farm4.static.flickr.com/3359/3272673947_05b4a21881_o.jpg'><img src='http://farm4.static.flickr.com/3359/3272673947_14b59c2aa1.jpg'></a></center></p>

<p><strong>In a verb-final language, however, you enter the arguments first and then the verb, making this strategy of suggesting appropriate arguments impossible.</strong> Note that in the user-contributed spreadsheet of <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how languages identify their arguments</a> we see that about a quarter of the languages we looked at are verb-final—that is, with Subject-Object-Verb canonical word order.</p>

<p>Instead of seeing this as a disadvantage, however, let&#8217;s see what verb-final order <em>allows</em> us to do.</p>

<h3>Mockup 2: A different kind of suggestion</h3>

<p>Not all verbs allow for every different kind of particle. For example, it doesn&#8217;t make sense to have a -に (<em>-ni</em>, &#8220;to&#8221; or dative) argument for a verb like 検索して (<em>kensaku-shite</em>, &#8220;search for&#8221;). In English we used this to suggest different types of arguments given a specific verb. In a verb-final language, we could do this <em>backwards</em>.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/verb-suggestion.png" alt="verb-suggestion.png" border="0" /></center></p>

<p><strong>Pros:</strong> This makes verbs highly discoverable, given a certain argument structure. For example, if you enter a few arguments, like a direct object, a &#8220;to&#8221; argument, and a &#8220;from&#8221; argument, it&#8217;ll suggest verbs that will do something to an object from somewhere to somewhere else. This way, you can easily try out verbs you didn&#8217;t even know existed. It&#8217;ll only give you verbs appropriate for your arguments, reducing the chance of writing a an infelicitous command.</p>

<p><strong>Cons:</strong> Without knowing what kinds of actions are available, it may be difficult to know what kinds of arguments to enter in the first place. If you have a specific verb or service you want to use it may be counterintuitive or downright tricky to start by guessing the right set of arguments.</p>

<p>In addition, from a technical point of view, this requires much of the prediction algorithms in English Ubiquity to run backwards. Ideally, there would be a closed (predetermined) class of particles and a predefined set of noun types. Verbs would not be able to define their own modifiers and noun classes as easily or freely as they can now.</p>

<h3>Conclusion</h3>

<p>The properties and challenges of Japanese grammar require that we not try to outright copy the English behavior but to think about what really makes sense in that language and that may be an important lesson as we move toward designing a localizable Ubiquity. Please post your questions and criticisms of this design or post your own mockups!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Happy <a href="http://www.un.org/depts/dhl/language/index.html">International Mother Language Day</a>! ^^&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/' rel='bookmark' title='How natural should a natural interface be?'>How natural should a natural interface be?</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/feed/</wfw:commentRss>
		<slash:comments>86</slash:comments>
		</item>
		<item>
		<title>Contribute: how your language identifies its arguments</title>
		<link>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/</link>
		<comments>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 09:37:18 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[contribute]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[grammatical relations]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1450</guid>
		<description><![CDATA[Earlier today I blogged on three different strategies languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs. I gathered some data from the fantastic World Atlas of Language Structures to put together a survey of many of the languages on the Internet. For each [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/link/setting-language-research-to-music/' rel='bookmark' title='Setting Language Research to Music'>Setting Language Research to Music</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Earlier today <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">I blogged on three different strategies</a> languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs.</p>

<p>I gathered some data from the fantastic <a href="http://wals.info/">World Atlas of Language Structures</a> to put together a survey of many of the languages on the Internet. For each of the languages, I got the canonical word order and whether the language marks the role of its argument on the verb and/or the arguments themselves.</p>

<iframe width='605' height='300' frameborder='0' src='http://spreadsheets.google.com/pub?key=pE-nN92qp_pa5P6YbUOw0HQ&#038;output=html'></iframe>

<p>As you can see, there are a number of data points that are still missing. <strong>Please contribute information on the languages you speak!</strong> You can <a href="http://spreadsheets.google.com/ccc?key=pE-nN92qp_pa5P6YbUOw0HQ">edit the spreadsheet on Google Docs</a>. Thanks!</p>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/link/setting-language-research-to-music/' rel='bookmark' title='Setting Language Research to Music'>Setting Language Research to Music</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

