<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; argument structure</title>
	<atom:link href="http://mitcho.com/blog/tag/argument-structure/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Sat, 11 Feb 2012 12:23:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>Adding Your Language to Ubiquity Parser 2</title>
		<link>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/</link>
		<comments>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 11:44:20 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[how to]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case marking]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[localization]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[semantic roles]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1956</guid>
		<description><![CDATA[NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions. You&#8217;ve seen the video. You speak another language. And you&#8217;re wondering, &#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221; The answer: not that hard. With [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2/Localization_Tutorial">these instructions</a>.</strong></p>

<p>You&#8217;ve <a href="http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/">seen the video</a>. You speak another language. And you&#8217;re wondering, <strong>&#8220;how hard is it to add my language to Ubiquity with Parser 2?&#8221;</strong> The answer: <strong>not that hard.</strong> With a little bit of JavaScript and knowledge of and interest in your own language, you&#8217;ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please <a href="http://ubiquity.mozilla.com/trac/ticket/662">submit your (even incomplete) language files</a>!</p>

<p><em>As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the <a href="http://ubiquity.mozilla.com/planet/">Ubiquity Planet</a> and/or <a href="http://mitcho.com/blog/">this blog</a> (<a href="http://mitcho.com/blog/feed/blog-only/">RSS</a>).</em></p>

<p><span id="more-1956"></span></p>

<h3>Set up your environment</h3>

<p>If you&#8217;re new to Ubiquity core development, you&#8217;ll want to first read the <a href="http://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Development_Tutorial">Ubiquity 0.1 Development Tutorial</a> to learn how to get a live copy of the Ubiquity repository using <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>. Once you&#8217;ve set up your Firefox profile to use this development version, make sure to try changing the <code>extensions.ubiquity.parserVersion</code> value to 2 in <code>about:config</code> (as seen in <a href="(http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-parser-2/)">this demo video</a>) to verify that Parser 2 is working for you.</p>

<p>As you read along, you may find it beneficial to follow along in the languages currently included in Parser 2: <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/en.js">English</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a>, <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/pt.js">Portuguese</a>, and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/sv.js">Swedish</a> (and the incomplete <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a> and <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/fr.js">French</a>).</p>

<h3>The structure of the language file</h3>

<p>Each language in Parser 2 gets its own file which acts as a <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>. You&#8217;ll need to look up the <a href="http://en.wikipedia.org/wiki/List of ISO 639-1 codes">ISO 639-1 code for your language</a>&#8230; Here we&#8217;ll use English (code <code>en</code>) as an example here and the JavaScript language file would then be called <code>en.js</code> and go in the <code>/ubiquity/modules/parser/new/</code> directory of the repository.</p>

<p>Here is the basic template for a Ubiquity Parser 2 language file:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> EXPORTED_SYMBOLS <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;makeEnParser&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">typeof</span> window<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #3366CC;">'undefined'</span><span style="color: #009900;">&#41;</span> <span style="color: #006600; font-style: italic;">// kick it chrome style</span>
  Components.<span style="color: #660066;">utils</span>.<span style="color: #003366; font-weight: bold;">import</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;resource://ubiquity/modules/parser/new/parser.js&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003366; font-weight: bold;">function</span> makeEnParser<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003366; font-weight: bold;">var</span> en <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Parser<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'en'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
...
&nbsp;
  <span style="color: #000066; font-weight: bold;">return</span> en<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>


<p>After lines 1-4 which set up the <a href="https://developer.mozilla.org/En/Using_JavaScript_code_modules">JavaScript module</a>, everything else is wrapped in a factory function called <code>makeLaParser</code> (for Latin) or <code>makeEnParser</code> (for English, <code>en</code>) or <code>makeFrParser</code> (for French, <code>fr</code>), etc. This function initializes the new <code>Parser</code> object (line 7) with the appropriate language code, sets a bunch of parameters (elided above) and returns it. That&#8217;s it!</p>

<p>Now let&#8217;s walk through some of the parameters you must set to get your language working. For reference, the properties the language parser object is required to have are: <code>branching</code>, <code>anaphora</code>, and <code>roles</code>.</p>

<h3>Identifying your branching parameter</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">branching</span> <span style="color: #339933;">=</span> <span style="color: #3366CC;">'right'</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// or 'left'</span></pre></div></div>


<p>One of the first things you&#8217;ll have to set for your parser is <strong>the <code>branching</code> parameter</strong>. Ubiquity Parser 2 uses the branching parameter to decide which direction to look for an argument after finding a delimiter or &#8220;role marker&#8221; (most often, these are <a href="http://en.wikipedia.org/wiki/adposition">prepositions or postpositions</a>. For example, in English &#8220;from&#8221; is a delimiter for the <code>goal</code> role and its argument is on its right.</p>

<table>
<tr><td>&nbsp;</td><td>&nbsp;</td><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-right.png) no-repeat right bottom'>&nbsp;</td></tr>
<tr><td><b>to</b></td><td>Mary</td><td><b>from</b></td><td>John</td></tr>
</table>

<p>So &#8220;John&#8221; is a possible argument for the <code>source</code> role, but &#8220;Mary&#8221; should not be. Ubiquity can figure this out because English has the property <code>en.branching = 'right'</code>.</p>

<p>In Japanese, on the other hand, the argument of a delimiter like から (&#8220;from&#8221;) is found on the left of that delimiter, so <code>en.branching = 'left'</code>.</p>

<table>
<tr><td colspan='2' style='background: transparent url(http://mitcho.com/i/cccarrow-left.png) no-repeat left bottom'>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>メアリー</td><td><b>-から</b></td><td>ジョン</td><td><b>-に</b></td></tr>
<tr><td>Mary</td><td><b>from</b></td><td>John</td><td><b>to</b></td></tr>
</table>

<p>In general, if your language has prepositions, you should use <code>.branching = 'right'</code> and if your language has postpositions, you can use <code>.branching = 'left'</code>.</p>

<p><strong>For more info</strong>:</p>

<ul>
<li>see <a href="http://en.wikipedia.org/wiki/Branching (linguistics)">branching</a> on Wikipedia.</li>
</ul>

<h3>Defining your roles</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">roles</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'goal'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'to'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'source'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'from'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'at'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'position'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'on'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'alias'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'as'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'using'</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
    <span style="color: #009900;">&#123;</span>role<span style="color: #339933;">:</span> <span style="color: #3366CC;">'instrument'</span><span style="color: #339933;">,</span> delimiter<span style="color: #339933;">:</span> <span style="color: #3366CC;">'with'</span><span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The second required property is the inventory of semantic roles and their corresponding delimiters. Each entry has a <code>role</code> from the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a> and a corresponding delimiter. Note that this mapping can be <a href="http://en.wikipedia.org/wiki/many-to-many (data model)">many-to-many</a>, i.e., each role can have multiple possible delimiters and different roles can have shared delimiters. Try to make sure to cover all of the roles in the <a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">inventory of semantic roles</a>.</p>

<p><strong>For more info:</strong></p>

<ul>
<li><a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">Writing commands with semantic roles</a></li>
<li><a href="http://mitcho.com/blog/projects/rolling-out-the-roles/">the proposed inventory of semantic roles</a></li>
<li>Wikipedia entry on <a href="http://en.wikipedia.org/wiki/thematic relations">thematic relations</a></li>
</ul>

<h3>Entering your anaphora (&#8220;magic words&#8221;)</h3>


<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;">  en.<span style="color: #660066;">anaphora</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;this&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;that&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;it&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;selection&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;him&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;her&quot;</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;them&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></div></div>


<p>The final required property is the <code>anaphora</code> property which takes a list of &#8220;magic words&#8221;. Currently there is no distinction between all the different <a href="http://en.wikipedia.org/wiki/deixis">deictic</a> <a href="http://en.wikipedia.org/wiki/anaphora (linguistics)">anaphora</a> which might refer to different things.</p>

<h3>Special cases</h3>

<p>Some special language features can be handled by overriding the default behavior from <code>Parser</code>. Many of these features are still in the works, however, so we&#8217;d love to get your comments!</p>

<h4>Languages with no spaces</h4>

<p>If your language does not delimit arguments (or words, more generally) with spaces, there will be a need to write a custom <code>wordBreaker()</code> function and set <code>usespaces = false</code> and <code>joindelimiter = ''</code>. For an example, please take a look at the <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/ja.js">Japanese</a> or <a href="https://ubiquity.mozilla.com/hg/ubiquity-firefox/raw-file/08cf861ba79a/ubiquity/modules/parser/new/zh.js">Chinese</a>.</p>

<h4>Case marking languages</h4>

<p><strike>If you have a strongly <a href="http://en.wikipedia.org/wiki/grammatical case">case-marked</a> language, you&#8217;ll have to write some rules to identify those different cases in <code>wordBreaker()</code> and then add some extra <code>roles</code> for these case markers, but for a number of languages the current design does not allow an elegant solution for parsing such arguments. Updates to this issue will be posted to <a href="http://ubiquity.mozilla.com/trac/ticket/663">this trac ticket</a>.</p>

<p>In the mean time, however, if you could write a parser even with only the prepositions/postpositions in your language, that would be a great benefit in getting started in your language.</strike> <strong>UPDATE</strong>: a proposal on how to deal with strongly case-marked languages has been written here: <a href="http://mitcho.com/blog/projects/in-case-of-case/">In Case of Case&#8230;</a>.</p>

<h4>Stripping articles</h4>

<p>Some languages have some delimiters which combine with articles. For example, in French, the preposition &#8220;à&#8221; combines with the masculine definite article &#8220;le&#8221; but not &#8220;la&#8221;:</p>

<ol>
<li>à + la = à la</li>
<li>à + le = au</li>
</ol>

<p>You can add both &#8220;à&#8221; and &#8220;au&#8221; as delimiters of the <code>goal</code> role, but then you will get feminine arguments back with the determiner (e.g. &#8220;la table&#8221;) while masculine arguments would be parsed without a determiner (e.g. &#8220;chat&#8221;).</p>

<ol>
<li>&#8220;<b>à</b> la table&#8221; = &#8220;<b>to</b> the table&#8221;</li>
<li>&#8220;<b>au</b> chat&#8221; = &#8220;<b>to the</b> cat&#8221;</li>
</ol>

<p><strike>One possible solution to this is to write a custom <code>cleanArgument()</code> method. After arguments have been parsed and placed in their appropriate roles, each argument text (say, &#8220;la table&#8221; or &#8220;chat&#8221;) are passed to <code>cleanArgument()</code>. You can simply write a <code>cleanArgument()</code> to strip off any &#8220;la &#8221; at the beginning of the input and return it and both example inputs will get normalized arguments: &#8220;table&#8221; and &#8220;chat&#8221;, respectively.</strike> <strong>UPDATE</strong>: For more up-to-date information on how to deal with these types of articles, please see <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem/">Solving a Romance Problem</a>.</p>

<h3>Test your parser</h3>

<p>Now you can go into <code>about:config</code> and change <code>extensions.ubiquity.language</code> to be your language code and restart. All the verbs and nountypes at this point will remain the same as in the English version, but it should obey the argument structure (the word order and delimiters) of your language.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> If you run into any trouble, feel free to ask for help on the <a href="http://groups.google.com/group/ubiquity-i18n">Ubiquity i18n listhost</a> or find me on the Ubiquity IRC channel (mitcho @ irc.mozilla.org#ubiquity). Of course, once you&#8217;re at a good stopping point, please <a href="http://ubiquity.mozilla.com/trac/ticket/662">contribute your language file to Ubiquity</a>!</p>

<h3>More to come&#8230;</h3>

<p>At this point, you&#8217;ve only localized the <a href="http://en.wikipedia.org/wiki/argument structure">argument structure</a> of your language&#8230; additional work will be required to localize the nountypes and verb names, which is <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">the subject of ongoing discussion</a>&#8230; <a href="http://groups.google.com/group/ubiquity-i18n">join the Google Group</a> to get in on the discussion!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>At this point in time it&#8217;s also possible to test your parser at <code>chrome://parser-demo/content/index.html</code> if you make a couple other changes to your code&#8230; for more information, watch the <a href="http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/">Foxkeh demos Ubiquity Parser TNG</a> video. This option gives you more debug info as well.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/' rel='bookmark' title='Ubiquity Parser: The Next Generation Demo'>Ubiquity Parser: The Next Generation Demo</a></li>
<li><a href='http://mitcho.com/blog/projects/rolling-out-the-roles/' rel='bookmark' title='Rolling out the Roles'>Rolling out the Roles</a></li>
<li><a href='http://mitcho.com/blog/projects/foxkeh-demos-ubiquity-parser-the-next-generation/' rel='bookmark' title='Foxkeh demos Ubiquity Parser: The Next Generation'>Foxkeh demos Ubiquity Parser: The Next Generation</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/how-to/adding-your-language-to-ubiquity-parser-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Rolling out the Roles</title>
		<link>http://mitcho.com/blog/projects/rolling-out-the-roles/</link>
		<comments>http://mitcho.com/blog/projects/rolling-out-the-roles/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 07:07:27 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1763</guid>
		<description><![CDATA[Jono and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands with semantic roles. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Jono and I have recently been working to incorporate the <a href="http://mitcho.com/blog/projects/ubiquity-parser-the-next-generation-demo/">Parser The Next Generation</a> into Ubiquity proper, and this of course involves the process of <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">retooling the standard commands with semantic roles</a>. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use and individual languages&#8217; parsers will be built to identify. Today I have just such a proposal.</p>

<p><span id="more-1763"></span></p>

<h3>Something to consider&#8230;</h3>

<p>As we rewrite these current commands to specify semantic roles instead of specific modifiers, it is important to distinguish between synonymous prepositions in English which actually map to different semantic roles. Here are two examples:</p>

<ul>
<li><code>with</code>: English &#8220;with&#8221; can refer to one of two relations: &#8220;together-with&#8221; as in &#8220;share this with Jono&#8221; and &#8220;using-with&#8221; as in &#8220;share this with delicious&#8221; or &#8220;eat this with a fork.&#8221;</li>
<li><code>in</code>: &#8220;in&#8221;, similarly, can refer to two different relations: &#8220;location-in&#8221; as in &#8220;find mexican food in Tokyo&#8221; and &#8220;format-in&#8221; as in &#8220;search Moscow in Russian&#8221; or &#8220;save this page in PDF.&#8221;</li>
</ul>

<p>A quick test for such cases is &#8220;would these markers translate to the same markers in a different language?&#8221; It&#8217;s easy to find a language where the two different &#8220;with&#8221;s and the two different &#8220;in&#8221;s are expressed using different words. <em>With semantic roles in Parser TNG, it&#8217;s okay for multiple semantic roles to share the same delimiters/markers.</em></p>

<h3>A proposed set of semantic roles</h3>

<p>Here is a set of semantic roles which I would like to propose. <em>Keep in mind that these roles should map to morphological features in languages, not necessarily to the type of content in the argument (which is why we also will keep the noun types).</em></p>

<ul>
<li><code>object</code>: direct object (the default or unmarked argument)</li>
<li><code>goal</code>: the goal or end point of (metaphorical) movement or transition

<ul>
<li>example: in English, arguments marked by &#8220;to&#8221;, &#8220;into&#8221;, &#8220;toward&#8221;, etc.</li>
</ul></li>
<li><code>source</code>: the source or starting point of (metaphorical) movement or transition<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>

<ul>
<li>example: in English, arguments marked by &#8220;from&#8221;, &#8220;by&#8221;, etc.</li>
</ul></li>
<li><code>position</code>: refers to a (metaphorical) location which defines the scope of an action, in contrast to <code>goal</code> and <code>source</code>.

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221;, &#8220;at&#8221;, &#8220;near&#8221;, etc.</li>
</ul></li>
<li><code>instrument</code>: a tool or intermediary to be used 

<ul>
<li>example: in English, arguments marked by &#8220;using&#8221; or &#8220;with&#8221;, as in &#8220;bookmark this with delicious.&#8221;</li>
</ul></li>
<li><code>format</code>: describes the intended or expected form of the result

<ul>
<li>example: in English, arguments marked by &#8220;in&#8221; as in &#8220;in PDF form&#8221; or &#8220;in German&#8221;</li>
</ul></li>
<li><code>alias</code>: a name or reference to 

<ul>
<li>example: in English, arguments marked by &#8220;as&#8221; as in &#8220;tag this as new&#8221; or &#8220;login to mail as aza.&#8221;</li>
</ul></li>
</ul>

<p>Note that all three locational roles, <code>goal</code>, <code>source</code>, and <code>location</code> may be used for both times and places as the morphological marking of temporal and spacial expressions are often conflated in language. The appropriate type of referent (time or space) can then be specified with the noun type.</p>

<p>As a quick sanity check of this proposal, here are all the standard feeds built into Ubiquity which have multiple arguments together with what semantic role is appropriate for each argument:</p>

<table>
<thead>
<tr><th>command</th><th>current modifier</th><th>semantic role</th>
</thead>
<tbody style='font-family: monospace'>
<tr><th>convert</th><td>to</td><td>goal, format</td></tr>
<tr><th>email</th><td>to</td><td>goal</td></tr>
<tr><th rowspan='2'>translate</th><td>to</td><td>goal, format</td></tr>
<tr><td>from</td><td>source</td></tr>
<tr><th>search</th><td>with</td><td>instrument</td></tr>
<tr><th>wikipedia</th><td>in</td><td>format</td></tr>
<tr><th>yelp</th><td>near</td><td>position</td></tr>
<tr><th>weather</th><td>in</td><td>location</td></tr>
<tr><th>twitter</th><td>as</td><td>alias</td></tr>
<tr><th rowspan='2'>share-on-delicious</th><td>tagged</td><td>alias</td></tr>
<tr><td>entitled</td><td>alias</td></tr>
</tbody>
</table>

<p>The only problematic standard command, then, is the <code>share-on-delicious</code> command which can take both tags and a title, both of which would most naturally correspond to the <code>alias</code> role. <strong>If you have a suggestion for how best to deal with this type of case, I&#8217;d love to hear your suggestions!</strong></p>

<p>We&#8217;d love to get your feedback to this proposed set of semantic roles. <strong>How do you feel about the proposed set of semantic roles laid out here?</strong> In particular, if you have a command or can envision a command which would like to use a semantic role which does not fit any of these roles or would take multiple arguments of the same role, please let us know! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>The <a href="http://scholar.google.com/scholar?q=&quot;types+of+lexical+information&quot;+fillmore">Filmore (1971)</a> semantic role of &#8220;result&#8221; may also be lumped into this.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-commands-by-the-numbers/' rel='bookmark' title='Ubiquity Commands by The Numbers'>Ubiquity Commands by The Numbers</a></li>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/rolling-out-the-roles/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Writing commands with semantic roles</title>
		<link>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/</link>
		<comments>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 08:05:23 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[proposal]]></category>
		<category><![CDATA[semantic role]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1497</guid>
		<description><![CDATA[Thank you to everyone who contributed data to how your language identifies its arguments! The data collection is ongoing so please contribute data points for languages you know! How Ubiquity identifies its arguments Currently when writing a command in Ubiquity you must specify two properties for each argument: a modifier (the appropriate adposition—the direct object [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Thank you to everyone who contributed data to <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how your language identifies its arguments</a>! The data collection is ongoing so please contribute data points for languages you know!</em></p>

<h3>How Ubiquity identifies its arguments</h3>

<p>Currently <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Author_Tutorial">when writing a command</a> in Ubiquity you must specify two properties for each argument: a modifier (the appropriate <a href="http://en.wikipedia.org/wiki/adposition">adposition</a>—the direct object excluded) and the <a href="https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Nountypes_Reference">noun type</a>. Here are some quick examples from the standard commands:</p>

<p><code>email</code>:</p>

<ul>
<li>direct object (<code>noun_arb_text</code>)</li>
<li><code>to</code> (<code>noun_type_contact</code>)</li>
</ul>

<p><code>translate</code>:</p>

<ul>
<li>direct object (<code>noun_arb_text</code>)</li>
<li><code>to</code> (<code>noun_type_language</code>)</li>
<li><code>from</code> (<code>noun_type_language</code>)</li>
</ul>

<p>This way of specifying arguments has a few shortcomings. First of all, it requires you to identify each type of argument by unique adposition, which does not support languages with <a href="http://en.wikipedia.org/wiki/case marking">case marking</a> nor languages with sets of synonymous adpositions (e.g. French {à la, au, aux}). Second, as we saw in <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how your language identifies its arguments</a> some languages don&#8217;t mark semantic roles on the arguments at all and the current system of specifying arguments is completely incompatible with these languages. Third, the current specification requires command authors to make localized versions of their commands, specifying the language-appropriate modifiers.</p>

<p><span id="more-1497"></span></p>

<p>In a perfect world the last issue could be solved (at least for languages which mark semantic roles with adpositions) by a mapping of English prepositions to the target language adpositions. Indeed, for some adpositions in some languages this may be possible:</p>

<table border='0'>
<tr><th colspan='2'>English/Ubiquity</th><th>Chinese</th><th>Japanese</th></tr>
<tr><td>to</td><td rowspan='2'>=></td><td>到 (dào)</td><td>-に (-ni)</td></tr>
<tr><td>from</td><td>从 (cóng)</td><td>-から (-kara)</td></tr>
</table>

<p>However, some English prepositions do not cleanly map to a particular adpositions. Take, for example, English &#8220;with.&#8221; This &#8220;with&#8221; may map to different markings in Chinese and Japanese depending on the sentence:</p>

<table border='0'>
<tr><th colspan='2'>English</th><th>Chinese</th><th>Japanese</th></tr>
<tr><td>share <strong>with</strong> Jono</td><td rowspan='2'>=></td><td>跟 (gēn)</td><td>-と (-to)</td></tr>
<tr><td>translate <strong>with</strong> Google</td><td>用 (yòng)</td><td>-で (-de)</td></tr>
</table>

<p>Note, however, that which set of markings &#8220;with&#8221; maps to is predictable, as there is a salient semantic difference. The first &#8220;with&#8221; could be referred to as <em>together-with</em> while the second is a <em>using-with</em>. With this distinction, we can easily predict which paradigm the &#8220;with&#8221; in &#8220;search <strong>with</strong> Google&#8221; should use, because these two &#8220;with&#8221; arguments represent two different <em>semantic roles</em>.</p>

<h3>A proposal: identifying arguments by semantic role<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></h3>

<p>Suppose commands could specify their arguments by referring to these <em>semantic roles</em> in lieu of adpositions as they currently do. This way, we would be able to automatically map commands into different languages. For example, you could write a new command called <code>move</code> with the following argument structure:</p>

<p><code>move</code>:</p>

<ul>
<li><code>role_object</code> (<code>noun_arb_text</code>)</li>
<li><code>role_goal</code> (<code>noun_type_geolocation</code>)</li>
<li><code>role_source</code> (<code>noun_type_geolocation</code>)</li>
</ul>

<p>The English mapping of &#8221; => <code>role_object</code>, &#8216;to&#8217; => <code>role_goal</code>, &#8216;from&#8217; => <code>role_source</code> could be used to parse the command</p>


<div class="wp_syntax"><div class="code"><pre class="english" style="font-family:monospace;">move truck from Tokyo to Paris</pre></div></div>


<p>In addition, with the Japanese mapping of &#8216;が&#8217; => <code>role_object</code>, &#8216;に&#8217; => <code>role_goal</code>, &#8216;から&#8217; => <code>role_source</code>, you could immediately use the command in Japanese as well:</p>


<div class="wp_syntax"><div class="code"><pre class="japanese" style="font-family:monospace;">東京からパリにトラックをmoveして</pre></div></div>


<p>In essence, this proposal would let command authors get their commands localized <em>for free</em>, as long as they stick to a predefined set of semantic roles. For more complex commands and legacy commands, of course, commands could optionally specify particular English modifiers, but then Ubiquity would simply not attempt to localize those commands.</p>

<p>In addition, each language specific parser would determine how to identify its arguments. This would allow languages with case marking or no role marking on arguments at all to handle their own mapping of arguments to semantic roles and still use shared commands. Even parsers such as English would benefit by letting the parser deal with synonymous prepositions and possibly even argument structure alternations (such as English <a href="http://en.wikipedia.org/wiki/ditransitive alternations">ditransitive alternations</a>).</p>

<p>As a starting point, we could use argument types based on the list of semantic roles given in <a href="http://scholar.google.com/scholar?q=&quot;types+of+lexical+information&quot;+fillmore">Fillmore (1971)</a>:</p>

<ul>
<li>Object: the entity that moves or changes or whose position or existence is in consideration</li>
<li>Result: the entity that comes into existence as a result of the action</li>
<li>Instrument: the stimulus or immediate physical cause of an event</li>
<li>Source: the place from which something moves</li>
<li>Goal: the place to which something moves</li>
<li>Experiencer: the entity which receives or accepts or experiences or undergoes the effect of an action &#8230;</li>
</ul>

<h3>Comments welcome!</h3>

<p><strong>As command authors and Ubiquity users, how do you feel about this proposal? How might this affect, simplify, or complicate the localization of Ubiquity into your language?</strong> Thank you in advance! ^^</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Thank you to <a href="http://jonoscript.wordpress.com">Jono</a> and <a href="http://theunfocused.net/">Blair</a> whose comments in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings/2009-02-23_i18n_Meeting">our i18n meeting</a> helped shape this proposal.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/' rel='bookmark' title='Ubiquity in Firefox: Focus on Japanese'>Ubiquity in Firefox: Focus on Japanese</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Firefox: Focus on Japanese</title>
		<link>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/</link>
		<comments>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/#comments</comments>
		<pubDate>Fri, 20 Feb 2009 11:08:14 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[mockup]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1466</guid>
		<description><![CDATA[One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I&#8217;m going to start exploring what [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/' rel='bookmark' title='How natural should a natural interface be?'>How natural should a natural interface be?</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>One of the eventual goals of the <a href="http://ubiquity.mozilla.com">Ubiquity project</a> is to bring some of its functionality and ideas to Firefox proper. To this end, <a href="http://azarask.in">Aza</a> has been exploring some possible options for what that would look like (<a href="http://www.azarask.in/blog/post/ubiquity-in-firefox-round-1/">round 1</a>, <a href="http://www.azarask.in/blog/post/ubiquity-in-the-firefox-round-2/">round 2</a>). All of his mockups, however, use English examples. I&#8217;m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let&#8217;s kick this off with my mother tongue, Japanese.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p><em>今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^</em> <strong>日本語でのコメントも大歓迎です！</strong></p>

<p><span id="more-1466"></span></p>

<h3>What commands look like in Japanese</h3>

<p>Japanese is not only just a verb-final language but it is strongly <a href="http://en.wikipedia.org/wiki/head-final">head-final</a>, meaning it has postpositions instead of prepositions, direct objects come before verbs, and adjectives precede nouns. In terms of <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">how it identifies its arguments</a>, every argument has a postposition/case marker (called a <em>particle</em> in the Japanese literature) which marks its role in the sentence.</p>

<p>A couple common particles we&#8217;ll look at in this example include -を (<em>-o</em>) which marks the direct object (accusative case, you might say) and -に (<em>-ni</em>) which acts like English &#8220;to&#8221; (dative case). The example sentence we&#8217;ll look at today is:</p>

<table border='0'>
<tr><td>ケーキを</td><td>ブレアに</td><td>送って</td><td>(ください)</td></tr>
<tr><td><em>kēki-o</em></td><td><em>burea-ni</em></td><td><em>okuʔte</em></td><td><em>kudasai</em></td></tr>
<tr><td>cake.ACC</td><td>Blair.DAT</em></td><td>send.IMP</td><td>&#8220;please&#8221;</td></tr>
<tr><td colspan='4'>&#8220;Please send a cake to Blair.&#8221;</td></tr>
</table>

<p>(Note: ʔ is a <a href="http://en.wikipedia.org/wiki/glottal stop">glottal stop</a>. ACC=accusative, DAT=dative, and IMP=imperative form.)</p>

<p>That final ください is often dropped in very casual speech and, as it adds no new information, we&#8217;ll assume today that the user will not enter it. Finally, Japanese doesn&#8217;t use spaces in their orthography, so the actual input would be &#8220;ケーキをブレアに送って&#8221;.</p>

<h3>Mockup 1: Particle identification</h3>

<p>One of the major hurdles in working with Japanese is that there are no spaces between the words. The natural first step is to split the sentence up into words, but this is a very difficult problem in <a href="http://en.wikipedia.org/wiki/Natural Language Processing">NLP</a> which <a href="http://research.microsoft.com/en-us/projects/japanesenlp/default.aspx">big name research groups</a> actively work on.</p>

<p>Fortunately, however, in <a href="http://www.azarask.in/blog/post/solving-the-it-problem/">&#8220;Solving the &#8216;It&#8217; Problem&#8221;</a> Aza suggests that, when we encounter ambiguity in our input, we can <em>go ask the user</em>. Great minds think alike, and computer scientist <a href="http://en.wikipedia.org/wiki/Jean E. Sammet">Jean E. Sammet</a> suggested the same idea <a href="http://doi.acm.org/10.1145/365230.365274">way back in 1953</a>:</p>

<blockquote>
  <p>Using English [or any other natural language] definitely involves the requirement for the computer (or more accurately its programming system) to query the user about any possible ambiguity.</p>
</blockquote>

<p>Parsing a sentence into words, in the limited context of Ubiquity, is really about identifying the particles which mark the end of each argument. Here&#8217;s a mockup of an application of the Sammet-Raskin Method to this problem:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/particle-id.png" alt="particle-id.png" border="0" /></center></p>

<p><strong>Pros:</strong> This completely takes care of the word-breaking problem, with minimal arbitration from the user. The parser knows <em>exactly</em> what arguments it&#8217;s dealing with and the visual feedback means the user won&#8217;t be surprised by the parse.</p>

<p><strong>Cons:</strong> Most of the particles/postpositions we&#8217;d have to deal with are a single character, so they may show up pretty often within words, in which case it would be quite annoying to have to press escape after each one.</p>

<p>An even smarter system, when wanting to mark a character as a particle, would first check to see that the argument (before the particle) is a valid argument type for that particle. If the check fails, it doesn&#8217;t have to bother with suggesting that character as a particle. This may cut down on the false positives.</p>

<h3>Smart suggestions: what works, what doesn&#8217;t</h3>

<p>One of the key suggestions in Aza&#8217;s mockups include a way to choose the prepositions while entering your arguments, based on the current verb.</p>

<p>For example, here, the <code>translate</code> command accepts a direct object, a <em>to</em>-object, and a <em>from</em>-object, so little <code>to</code> and <code>from</code> markers magically show up on the right side, making the appropriate prepositions (and by extension the appropriate arguments) discoverable. I think this line of thinking is a really good one, at least for English.</p>

<p><center><a class='limages' rel='lightbox[verbfinal]' href='http://farm4.static.flickr.com/3359/3272673947_05b4a21881_o.jpg'><img src='http://farm4.static.flickr.com/3359/3272673947_14b59c2aa1.jpg'></a></center></p>

<p><strong>In a verb-final language, however, you enter the arguments first and then the verb, making this strategy of suggesting appropriate arguments impossible.</strong> Note that in the user-contributed spreadsheet of <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">how languages identify their arguments</a> we see that about a quarter of the languages we looked at are verb-final—that is, with Subject-Object-Verb canonical word order.</p>

<p>Instead of seeing this as a disadvantage, however, let&#8217;s see what verb-final order <em>allows</em> us to do.</p>

<h3>Mockup 2: A different kind of suggestion</h3>

<p>Not all verbs allow for every different kind of particle. For example, it doesn&#8217;t make sense to have a -に (<em>-ni</em>, &#8220;to&#8221; or dative) argument for a verb like 検索して (<em>kensaku-shite</em>, &#8220;search for&#8221;). In English we used this to suggest different types of arguments given a specific verb. In a verb-final language, we could do this <em>backwards</em>.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/verb-suggestion.png" alt="verb-suggestion.png" border="0" /></center></p>

<p><strong>Pros:</strong> This makes verbs highly discoverable, given a certain argument structure. For example, if you enter a few arguments, like a direct object, a &#8220;to&#8221; argument, and a &#8220;from&#8221; argument, it&#8217;ll suggest verbs that will do something to an object from somewhere to somewhere else. This way, you can easily try out verbs you didn&#8217;t even know existed. It&#8217;ll only give you verbs appropriate for your arguments, reducing the chance of writing a an infelicitous command.</p>

<p><strong>Cons:</strong> Without knowing what kinds of actions are available, it may be difficult to know what kinds of arguments to enter in the first place. If you have a specific verb or service you want to use it may be counterintuitive or downright tricky to start by guessing the right set of arguments.</p>

<p>In addition, from a technical point of view, this requires much of the prediction algorithms in English Ubiquity to run backwards. Ideally, there would be a closed (predetermined) class of particles and a predefined set of noun types. Verbs would not be able to define their own modifiers and noun classes as easily or freely as they can now.</p>

<h3>Conclusion</h3>

<p>The properties and challenges of Japanese grammar require that we not try to outright copy the English behavior but to think about what really makes sense in that language and that may be an important lesson as we move toward designing a localizable Ubiquity. Please post your questions and criticisms of this design or post your own mockups!</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Happy <a href="http://www.un.org/depts/dhl/language/index.html">International Mother Language Day</a>! ^^&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/' rel='bookmark' title='How natural should a natural interface be?'>How natural should a natural interface be?</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/ubiquity-in-firefox-japanese/feed/</wfw:commentRss>
		<slash:comments>86</slash:comments>
		</item>
		<item>
		<title>Three ways to argue over arguments</title>
		<link>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/</link>
		<comments>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 03:26:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[agreement]]></category>
		<category><![CDATA[ambiguity]]></category>
		<category><![CDATA[Ancient Greek]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[Chinese]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[grammatical relations]]></category>
		<category><![CDATA[Hungarian]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandarin]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>
		<category><![CDATA[word order]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1413</guid>
		<description><![CDATA[UPDATE: Contribute information on how your language identifies its arguments here. When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/observation/gaba-shame-on-you/' rel='bookmark' title='Gaba, Shame On You'>Gaba, Shame On You</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>UPDATE: Contribute information on how your language identifies its arguments <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">here</a>.</em></p>

<p>When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He sees Mary.
彼が Maryを 見る。 (Kare-ga Mary-o miru.)</pre></td></tr></table></div>


<p>As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). <strong>How do different languages code for arguments in different roles?</strong> There are, broadly speaking, three different ways:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/threeways.png" alt="three ways to code for arguments in different roles" border="0" width="536" height="284" /></center></p>

<p>We&#8217;ll take a brief look today at these three different strategies, all of which <a href="http://www.azarask.in/blog/post/scaling-ubiquity-to-60-languages-we-need-your-help/">a localizeable natural language interface</a> will surely encounter.</p>

<p><span id="more-1413"></span></p>

<h3>Word order</h3>

<p>In many languages, the position of the arguments relative to one another and to the verb determine the roles which each argument will play. Mandarin Chinese is a good example of such a language:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">他 喜欢 Mary (Ta xihuan Mary)
Mary 喜欢 他 (Mary xihuan ta)</pre></td></tr></table></div>


<p>Here, sentence (3) says &#8220;he likes Mary&#8221; while sentence (4) says &#8220;Mary likes him&#8221;. Simply reversing the positions of &#8220;he/him&#8221; and &#8220;Mary&#8221; we&#8217;re able to flip the roles that they fill in the sentence: that of the person who does the liking and the person who is being liked. Now take a look at sentence (5) which means &#8220;John says &#8216;hello&#8217; to Mary.&#8221;</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">John 告诉 Mary &quot;你 好&quot; (John gaosu Mary &quot;ni hao&quot;)</pre></td></tr></table></div>


<p>We note here that, while in English we used a different strategy of marking one argument (we marked the &#8220;hello&#8221; argument with &#8220;to&#8221;), Chinese doesn&#8217;t mark either of the arguments. There is, however, a clearly defined order to the arguments, which you might encode this way:</p>


<div class="wp_syntax"><div class="code"><pre class="code" style="font-family:monospace;">say [who you're speaking to] [what you're saying]</pre></div></div>


<p>If you swap the order of the two objects in this sentence, it becomes ungrammatical. (<strong>Note:</strong> the asterisk * here means the sentence is <em>ungrammatical</em>.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">* John 告诉 &quot;你 好&quot; Mary (John gaosu &quot;ni hao&quot; Mary)</pre></td></tr></table></div>


<p>Here, the word order dictates that &#8220;你好&#8221; must be &#8220;who you&#8217;re speaking to&#8221; and &#8220;Mary&#8221; must be &#8220;what you&#8217;re saying,&#8221; but that doesn&#8217;t make sense, so the sentence is ungrammatical.</p>

<h3>Marking the arguments</h3>

<p>Another possible strategy is to mark each argument (or some of the arguments) so that each argument&#8217;s role is clear. In many languages this is done with <a href="http://en.wikipedia.org/wiki/case marking">case marking</a>. Take for example this Ancient Greek sentence with its English gloss on line (6). Here, NOM refers to <a href="http://en.wikipedia.org/wiki/nominative case">nominative case</a> and ACC refers to <a href="http://en.wikipedia.org/wiki/accusative case">accusative case</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho  didaskal-os  paideuei to  paidi-on  (SVO)
the teacher -NOM teaches  the boy  -ACC</pre></td></tr></table></div>


<p>This sentence means &#8220;the teacher instructs the boy.&#8221; While sentence (5) is in Subject-Verb-Object order, any of the six possible orderings of {subject, verb, object} are also grammatical and mean the same thing:<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>7
8
9
10
11
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho didaskalos to paidion paideuei (SOV)
paideuei ho didaskalos to paidion (VSO)
paideuei to paidion ho didaskalos (VOS)
to paidion ho didaskalos paideuei (OSV)
to paidion paideuei ho didaskalos (OVS)</pre></td></tr></table></div>


<p>Many languages also use <a href="http://en.wikipedia.org/wiki/adposition">adpositions</a> (prepositions and/or postpositions) to further clarify the role of an argument in addition to case (like English does) or in lieu of case marking altogether. The idea is the same, though: you want to clarify the roles of the arguments so you morphologically mark the arguments with their roles.</p>

<h3>Marking the verb</h3>

<p>Many languages mark the verb with some information about the argument in a certain role, so that we can properly identify the argument&#8217;s roles. This kind of phenomenon is called <em>agreement</em>.</p>

<p>The most common type of verbal agreement is subject agreement, where the verb is marked by a specific form depending on some features of the subject. Anyone who&#8217;s taken French 101 will recognize this verb conjugation paradigm:</p>

<table>
<tr><th></th><th>subject</th><th>être (to be)</th></tr>
<tr><td rowspan='3'>singular</td><td>je (I)</td><td>suis</td></tr>
<tr><td>tu (you)</td><td>es</td></tr>
<tr><td>il/elle (he/she)</td><td>est</td></tr>
<tr><td rowspan='3'>plural</td><td>nous (we)</td><td>sommes</td></tr>
<tr><td>vous (plural you)</td><td>êtes</td></tr>
<tr><td>ils (they)</td><td>sont</td></tr>
</table>

<p>With this paradigm, if you hear or see &#8220;suis&#8221; in a French sentence, you immediately know that &#8220;je&#8221; (<em>I</em>) must be the subject and if you see &#8220;sommes,&#8221; &#8220;nous&#8221; (<em>we</em>) is the subject, etc. <a href="http://en.wikipedia.org/wiki/Standard Average European">Standard Average European</a> languages tend to exhibit this sort of subject-verb agreement.</p>

<p>Features of the subject position aren&#8217;t the only thing that can be marked on the verb, though. Hungarian, for example, has a type of object agreement. Specifically, the verb marks whether the object is definite or not (in linguistics lingo, &#8220;the verb agrees with the object&#8217;s definiteness feature&#8221;).</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
14
15
</pre></td><td class="code"><pre class="hungarian" style="font-family:monospace;">John lát  egy almát.
John sees an  apple
John látja az  almát.
John sees  the apple</pre></td></tr></table></div>


<p>Notice that in sentence (12) (glossed in (13)) the verb for &#8220;see&#8221; is realized as &#8220;lát,&#8221; while in (14) it&#8217;s &#8220;látja.&#8221; A speaker can use that agreement to see whether the object is definite or not and thus limit the possible object arguments out of all the nouns in the sentence.</p>

<h3>All of the above</h3>

<p><a href='http://www.qwantz.com/'><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/whom.gif" alt="whom.gif" border="0" width="650" height="442" /></a></p>

<p>Most languages do not use only one of these strategies, but a combination of them. English is a very good example. In a sentence like (12) below the main coding of grammatical roles seems to be word order alone. By reversing the word order into (13), we can effectively swap the argument&#8217;s roles.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes Mary.
Mary likes John.</pre></td></tr></table></div>


<p>However, this doesn&#8217;t work with pronominal arguments. Swapping the arguments in (14) yields (15) which is ungrammatical due to the case marking on the pronouns.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>14
15
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He likes her.
* Her likes he.</pre></td></tr></table></div>


<p>In addition, the verb in English must agree with the subject&#8217;s number (singular or plural):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>16
17
18
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes them.
* They likes John.
They like John.</pre></td></tr></table></div>


<p>In this way, English exhibits all three strategies: word order, case marking, and agreement, although often only word order is actively used to disambiguate the roles of arguments.</p>

<p><strong>Question:</strong> What strategies are used by your language to mark the roles of different arguments?</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>The following example is from <a href="http://www.personal.uni-jena.de/~x4diho/LingTyp%20Grammatical%20relations.ppt">Holger Diessel</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>&#8220;Mean the same thing&#8221; here means that the teacher is always instructing and the boy is always being instructed. The sentences may differ in when or how they are used depending on which argument is being talked about or what the implications of the utterance are. The formal notion is <em>truth-conditional equivalence</em>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/observation/gaba-shame-on-you/' rel='bookmark' title='Gaba, Shame On You'>Gaba, Shame On You</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

