<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; case</title>
	<atom:link href="http://mitcho.com/blog/tag/case/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 02:04:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-19719</generator>
		<item>
		<title>In Case of Case&#8230;</title>
		<link>http://mitcho.com/blog/projects/in-case-of-case/</link>
		<comments>http://mitcho.com/blog/projects/in-case-of-case/#comments</comments>
		<pubDate>Wed, 06 May 2009 09:54:53 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Arabic]]></category>
		<category><![CDATA[Basque]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[German]]></category>
		<category><![CDATA[Latin]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Polish]]></category>
		<category><![CDATA[Turkish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1994</guid>
		<description><![CDATA[A recently hot topic of discussion in the Ubiquity i18n realm has been how to deal with strongly case-marking languages. As we continue to make steady progress, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2. Introduction Grammatical case is a marking on nouns [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>A recently hot topic of discussion in the <a href="https://wiki.mozilla.org/Labs/Ubiquity/i18n">Ubiquity i18n</a> realm has been <a href="http://groups.google.com/group/ubiquity-i18n/browse_thread/thread/ab4d876b1ea02d4">how to deal with strongly case-marking languages</a>. As we continue to make <a href="http://ubiquity.mozilla.com/hg/ubiquity-firefox/log?rev=new-parser">steady progress</a>, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2.</p>

<h3>Introduction</h3>

<p><a href="http://en.wikipedia.org/wiki/Grammatical case">Grammatical case</a> is a marking on nouns that express grammatical function. Not all languages exhibit case. In many of the Indo-European languages we hope to bring Ubiquity to, case is realized as a suffix.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Here&#8217;s a classic example of case from <a href="http://en.wikipedia.org/wiki/Latin">Latin</a>. (Line 2 is the gloss of 1, line 4 of 3.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="la" style="font-family:monospace;">canis      virum      momordit
dog=sg.NOM man=sg.ACC bite=3sg.perfect
vir        canem      momordit
man=sg.NOM dog=sg.ACC bite=3sg.perfect</pre></td></tr></table></div>


<p>Example (1) is &#8220;the man bit the dog,&#8221; while example (3) is &#8220;the dog bit the man.&#8221; The only difference, as you see in the gloss, is that the nouns <em>canis</em> and <em>vir</em> are marked with different case endings in the two sentences. By marking the nouns with different cases (here, <a href="http://en.wikipedia.org/wiki/nominative">nominative</a> and <a href="http://en.wikipedia.org/wiki/accusative">accusative</a>), their semantic roles in the sentence—which is the the biter and which is the bitee—can be identified unambiguously. (Their positions are also switched in these examples but in reality Latin has a very free word order—the same sentences with other word orders including OSV or VSO are also common.)</p>

<p>At first glance, strongly case-marked languages may look like a godsend for <a href="http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/">identifying the semantic roles of arguments</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> If we can easily and unambiguously recognize arguments&#8217; cases to put them in their appropriate semantic roles, this could simplify processing as well as make Ubiquity input follow a <a href="http://mitcho.com/blog/projects/how-natural-should-a-natural-interface-be/">natural syntax</a> for such languages. Unfortunately, there are some significant challenges which must be overcome in order to make the processing of case-markers worthwhile.</p>

<p><span id="more-1994"></span></p>

<h3>The case against case</h3>

<p>There are broadly three different difficulties with dealing with strongly case-marking languages: (1) how to identify case correctly, (2) how to identify the boundaries of the arguments, and (3) what case to use when handing the arguments to the verb&#8217;s preview and execution.</p>

<h4>Parsing for case</h4>

<p>In some languages, it is very easy to recognize different case endings. For example, for Turkish it would be relatively easy to write a regular expression for each of the cases below, even with the <a href="http://en.wikipedia.org/wiki/vowel harmony">vowel harmony</a> as exhibited in the genitive and accusative cases between <em>i</em> and <em>ü</em>.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></p>

<table>
<tr>
<th>Case</th>
<th>Ending</th>
<th><i>köy</i> &#8220;village&#8221;</th>
<th>Meaning</th>
</tr>
<tr>
<td>Nominative</td>
<td>Ø (none)</td>
<td><i>köy</i></td>
<td>village</td>
</tr>
<tr>
<td>Genitive</td>
<td><i>-in</i></td>
<td><i>köyün</i></td>
<td>the village&#8217;s<br />
of the village</td>
</tr>
<tr>
<td>Dative</td>
<td><i>-e</i></td>
<td><i>köye</i></td>
<td>to the village</td>
</tr>
<tr>
<td>Accusative</td>
<td><i>-i</i></td>
<td><i>köyü</i></td>
<td>the village</td>
</tr>
<tr>
<td>Ablative</td>
<td><i>-den</i></td>
<td><i>köyden</i></td>
<td>from the village</td>
</tr>
<tr>
<td>Locative</td>
<td><i>-de</i></td>
<td><i>köyde</i></td>
<td>in the village</td>
</tr>
</table>

<p>(Example from <a href="http://en.wikipedia.org/wiki/Turkish language">Turkish language</a> on Wikipedia.)</p>

<p>However, in many other languages identifying case affixes can be quite difficult as they vary greatly depending on the root noun, not to mention irregular declensions. For example, in Polish the nominative <em>student</em> becomes the <em>studenta</em> in the accusative which may look like a simple suffix, but the nominative <em>pies</em> (&#8220;dog&#8221;) becomes <em>psa</em> while <em>stół</em> (&#8220;table&#8221;) remains unchanged.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> Writing rules for these differing (and sometimes not unambiguous) case-marking paradigms without building in lexical information would be very difficult indeed.</p>

<h4>Finding the edges</h4>

<p>Recall that the current <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2 design</a> identifies arguments by identifying known delimiters (most often some adposition) as a left or right edge of an argument. By not having to run the nountype detection over every substring of the input, we greatly reduce the processing time needed in each parse. This approach, however, relies on our being able to reliably identify some sort of boundary for each of our arguments.</p>

<p>In strongly case-marking languages, the case is realized on the noun itself, but this noun may be buried in the middle of the noun phrase. Even if we could reliably identify the case-marker, it would mark neither the left nor right edge of the argument, making our current parsing strategy worthless. For example, consider the following Arabic example of &#8220;the house of the man&#8221; in nominative and accusative cases:<sup id="fnref:6"><a href="#fn:6" rel="footnote">5</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="ar" style="font-family:monospace;">baytu 'r-rajuli
house=NOM of=man
bayta 'r-rajuli
house=ACC of=man</pre></td></tr></table></div>


<p>In these cases, we see that the only distinction between (5) (بَيتُ الرَّجُلِِ) and (7) (بَيتِ الرَّجُلِِ) is the case suffix on the head noun, &#8220;house,&#8221; which sits in the middle of the noun phrase. Even if we could properly identify this case ending, it would mark neither the left nor the right edge of the entire argument.</p>

<p>Contrast this with German where, even though arguments have case, the case is realized on the article, not on the noun head itself, so we can essentially deal with these articles as prepositions, using the article as the left edge of the argument.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
7
8
</pre></td><td class="code"><pre class="de" style="font-family:monospace;">den     großen Hund
the=ACC big    dog
dem     großen Hund
the=DAT big    dog</pre></td></tr></table></div>


<p>Believe it or not, things can actually get even worse than just not being able to find an edge of our arguments. The worst-case scenario comes from discontinuous constituents, in languages where case marking on both nouns and modifiers allow for very free word order. Latin is just such a language:<sup id="fnref:5"><a href="#fn:5" rel="footnote">6</a></sup></p>

<p>From M. Tullius Cicero, &#8220;Against Catiline,&#8221; chapter 1:<br/>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">quem        ad finem      sese effrenata                       iactabit         audacia?
what=sg.ACC to extent=ACC self unbridle=perf-past-part.3sg.NOM fling=future.3sg audacity=sg.NOM</pre></div></div>


<br/>
&#8220;To what extent will (your) unbridled audacity fling itself about?&#8221;
</p>

<p>In this example we see that <em>effrenata</em> is modifying <em>audacia</em> but the two do not form a unit in the linear order but their relationship can be recovered because both words carry the nominative case marking. While it would be unfair to expect Ubiquity to ever be able to properly parse such arguments, requiring a certain amount of discipline from the user, this is an illustration of how bad things could get if we took the processing of case-markers to the extreme.</p>

<h4>The proper case for execution</h4>

<p>The final difficulty in processing case-markings in Ubiquity comes from the preview and execution stages of a Ubiquity command&#8217;s usage. That is, after we parse the input, we must give the verb the arguments we found so that it can display a meaningful preview or behave correctly when executed. At this point, what case should the noun be when we hand the string of the argument to the verb?</p>

<p><a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="CAVE CANEM" target="_blank"><img src="http://farm1.static.flickr.com/120/275046371_9080289d04.jpg" alt="CAVE CANEM" border="0" /></a><br /><small><a href="http://creativecommons.org/licenses/by-sa/2.0/" title="Attribution-ShareAlike License" target="_blank"><img src="http://mitcho.com/blog/wp-content/plugins/photo-dropper/images/cc.png" alt="Creative Commons License" border="0" width="16" height="16" align="absmiddle" /></a> <a href="http://www.photodropper.com/photos/" target="_blank">photo</a> credit: <a href="http://www.flickr.com/photos/43567335@N00/275046371/" title="Platinatore" target="_blank">Platinatore</a></small></p>

<p>For example, consider the Latin expression &#8220;cave canem!&#8221; meaning &#8220;beware the dog!&#8221;</p>


<div class="wp_syntax"><div class="code"><pre class="la" style="font-family:monospace;">cave              canem!
beware=imperative dog=sg.ACC</pre></div></div>


<p>Supposing for a moment that we&#8217;ve implemented the <em>cavere</em> (&#8220;beware&#8221;) verb in Ubiquity and properly parsed &#8220;cave canem,&#8221; should we pass the literal string &#8220;canem&#8221; in accusative case to the verb, or the nominative string &#8220;canis,&#8221; or the root &#8220;can-&#8220;? Which is more appropriate? If &#8220;canis&#8221; is the more appropriate choice, Ubiquity would then have to be responsible for declining the accusative into a nominative&#8230; for all case-marked languages. This is clearly a road we do not want to go down.</p>

<h3>Proposal: only support determiners and adpositions</h3>

<p>I&#8217;ve laid out three reasons why processing strongly case-marked languages in Ubiquity is a non-starter. Fortunately, languages often have multiple different strategies for accomplishing similar communicative tasks. One oft-used strategy for <a href="http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/">marking different roles of arguments</a> is the use of <strong>adpositions</strong> (a fancy term for prepositions and postpositions). Unlike case-markers which often are affixes on nouns, prepositions mark the beginning of an argument and postpositions the end, as is used in the current parsing strategy.</p>

<p>From a formal/theoretical perspective, adpositions sit above the noun phrase proper, while modifiers like adjectives live within the noun phrase. This reflects the fact that, with few exceptions, adpositions mark an edge of the noun phrase, which is crucial to our parsing strategy. (Here, PP is a prepositional phrase and NP is a noun phrase.) Note also that for languages such as German which marks case on determiners (D), the same logic holds.</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/dcaa2cd9-4c7b-45fd-8a44-75c25b1b5561.jpg" alt="DCAA2CD9-4C7B-45FD-8A44-75C25B1B5561.jpg" border="0" width="126" height="106" style='vertical-align:middle;padding:5px;' /><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/936098e0-425b-43e1-8cec-d188d43cc942.jpg" alt="936098E0-425B-43E1-8CEC-D188D43CC942.jpg" border="0" width="170" height="134" style='vertical-align:middle;padding:5px;' /></p>

<p>Note also that, as long as the case-marking is phrase-marking  (i.e. marking the edge of the noun phrase) rather than just affixing to the head noun, our parsing strategy will work. This means we could possibly in the future write a simple RegExp to split off the Basque dative suffix, as it marks the end of the entire noun phrase. This can be seen in the following data from <a href="http://www.loria.fr/~tseng/Pubs/lsk04.pdf">Tseng (2004)</a>, where the suffix <em>-(r)i</em> affixes to the last word in the noun phrase, no matter the type of speech of that last word. (Basque is crazy cool!)</p>

<p><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-2.png" alt="Picture 2.png" border="0" width="539" height="66" /></p>

<h3>Conclusion</h3>

<p>In this blog I&#8217;ve outlined some reasons why it would be unreasonable or very difficult to incorporate case-marker processing into our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">current parser strategy</a>. The case markers themselves are often hard to identify, the case markers do not align at the edge of arguments, and there is the question of what form of the argument should be passed to the verb for preview and/or execution. Luckily many languages allow for adpositions (prepositions and postpositions) as an alternative strategy to case as a means of marking the different grammatical functions of arguments. By limiting Ubiquity parsing to adpositions (and case-marked determiners), I believe we are able to reach a good compromise between each user&#8217;s natural language and an easily machine-processable form.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Note that when linguists talk about &#8220;case,&#8221; they could be referring to two different (though related) concepts: case (lowercase) is the observed pattern of affixes on nouns which indicate grammatical function, while Case (uppercase) refers to a theoretical (formal) feature of syntactic objects—certain lexical items &#8220;assign Case&#8221; or &#8220;receive Case&#8221; and its mismatches were ruled out in <a href="http://en.wikipedia.org/wiki/Government and binding theory">GB</a> syntax by the Case Filter. You&#8217;ll find GB linguistics papers referring to &#8220;case&#8221; when discussing Mandarin Chinese, for example, a language that doesn&#8217;t have any overt case (lowercase) and you&#8217;ll know immediately that this usage is an uppercase Case case. In this blog post I&#8217;ll be dealing primarily with the former descriptive notion.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>When I refer to &#8220;strongly case-marking languages,&#8221; I am referring to languages with a non-trivial inventory of cases (not just nominative, accusative, and genitive) and where a noun phrase&#8217;s case is not reflected on <a href="http://en.wikipedia.org/wiki/determiner (class)">determiners</a>. For example, <a href="http://en.wikipedia.org/wiki/German language">German</a> is excluded by this definition as case is realized exclusively on articles and there is no need to find and parse the noun head itself to identify its case—more information on German is in the section &#8220;finding the edges.&#8221;&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:3">
<p>In reality Turkish case morphology does get a little more complicated than this with some consonants shifting as well, but it is still possible to <a href="http://www.sfs.uni-tuebingen.de/iscl/Theses/makedonski.pdf">identify Turkish case with regular expressions</a>.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:4">
<p>For those of you who were curious, this difference in Polish is based on the differing genders of each of these words. Data from <a href="http://en.wikipedia.org/wiki/Polish language">Polish language</a> on Wikipedia.&#160;<a href="#fnref:4" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:6">
<p>Example from <a href="http://en.wikipedia.org/wiki/Iʻrāb">Iʻrāb</a> on Wikipedia.&#160;<a href="#fnref:6" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:5">
<p>Thank you to <a href="http://bpick.tumblr.com/">Bailey Pickens</a> for help with the Latin data.&#160;<a href="#fnref:5" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/' rel='bookmark' title='Three ways to argue over arguments'>Three ways to argue over arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/' rel='bookmark' title='Contribute: how your language identifies its arguments'>Contribute: how your language identifies its arguments</a></li>
<li><a href='http://mitcho.com/blog/projects/writing-commands-with-semantic-roles/' rel='bookmark' title='Writing commands with semantic roles'>Writing commands with semantic roles</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/in-case-of-case/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Three ways to argue over arguments</title>
		<link>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/</link>
		<comments>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 03:26:05 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[agreement]]></category>
		<category><![CDATA[ambiguity]]></category>
		<category><![CDATA[Ancient Greek]]></category>
		<category><![CDATA[argument structure]]></category>
		<category><![CDATA[arguments]]></category>
		<category><![CDATA[case]]></category>
		<category><![CDATA[Chinese]]></category>
		<category><![CDATA[coding properties]]></category>
		<category><![CDATA[English]]></category>
		<category><![CDATA[grammatical relations]]></category>
		<category><![CDATA[Hungarian]]></category>
		<category><![CDATA[Japanese language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandarin]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[ubiquity]]></category>
		<category><![CDATA[verbs]]></category>
		<category><![CDATA[word order]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1413</guid>
		<description><![CDATA[UPDATE: Contribute information on how your language identifies its arguments here. When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are [...]
Related posts:<ol>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/observation/gaba-shame-on-you/' rel='bookmark' title='Gaba, Shame On You'>Gaba, Shame On You</a></li>
</ol>

Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>UPDATE: Contribute information on how your language identifies its arguments <a href="http://mitcho.com/blog/projects/contribute-how-your-language-identifies-its-arguments/">here</a>.</em></p>

<p>When we execute a command in Ubiquity, in very simple terms, we&#8217;re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He sees Mary.
彼が Maryを 見る。 (Kare-ga Mary-o miru.)</pre></td></tr></table></div>


<p>As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). <strong>How do different languages code for arguments in different roles?</strong> There are, broadly speaking, three different ways:</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/threeways.png" alt="three ways to code for arguments in different roles" border="0" width="536" height="284" /></center></p>

<p>We&#8217;ll take a brief look today at these three different strategies, all of which <a href="http://www.azarask.in/blog/post/scaling-ubiquity-to-60-languages-we-need-your-help/">a localizeable natural language interface</a> will surely encounter.</p>

<p><span id="more-1413"></span></p>

<h3>Word order</h3>

<p>In many languages, the position of the arguments relative to one another and to the verb determine the roles which each argument will play. Mandarin Chinese is a good example of such a language:</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">他 喜欢 Mary (Ta xihuan Mary)
Mary 喜欢 他 (Mary xihuan ta)</pre></td></tr></table></div>


<p>Here, sentence (3) says &#8220;he likes Mary&#8221; while sentence (4) says &#8220;Mary likes him&#8221;. Simply reversing the positions of &#8220;he/him&#8221; and &#8220;Mary&#8221; we&#8217;re able to flip the roles that they fill in the sentence: that of the person who does the liking and the person who is being liked. Now take a look at sentence (5) which means &#8220;John says &#8216;hello&#8217; to Mary.&#8221;</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">John 告诉 Mary &quot;你 好&quot; (John gaosu Mary &quot;ni hao&quot;)</pre></td></tr></table></div>


<p>We note here that, while in English we used a different strategy of marking one argument (we marked the &#8220;hello&#8221; argument with &#8220;to&#8221;), Chinese doesn&#8217;t mark either of the arguments. There is, however, a clearly defined order to the arguments, which you might encode this way:</p>


<div class="wp_syntax"><div class="code"><pre class="code" style="font-family:monospace;">say [who you're speaking to] [what you're saying]</pre></div></div>


<p>If you swap the order of the two objects in this sentence, it becomes ungrammatical. (<strong>Note:</strong> the asterisk * here means the sentence is <em>ungrammatical</em>.)</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
</pre></td><td class="code"><pre class="chinese" style="font-family:monospace;">* John 告诉 &quot;你 好&quot; Mary (John gaosu &quot;ni hao&quot; Mary)</pre></td></tr></table></div>


<p>Here, the word order dictates that &#8220;你好&#8221; must be &#8220;who you&#8217;re speaking to&#8221; and &#8220;Mary&#8221; must be &#8220;what you&#8217;re saying,&#8221; but that doesn&#8217;t make sense, so the sentence is ungrammatical.</p>

<h3>Marking the arguments</h3>

<p>Another possible strategy is to mark each argument (or some of the arguments) so that each argument&#8217;s role is clear. In many languages this is done with <a href="http://en.wikipedia.org/wiki/case marking">case marking</a>. Take for example this Ancient Greek sentence with its English gloss on line (6). Here, NOM refers to <a href="http://en.wikipedia.org/wiki/nominative case">nominative case</a> and ACC refers to <a href="http://en.wikipedia.org/wiki/accusative case">accusative case</a>.<sup id="fnref:2"><a href="#fn:2" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>5
6
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho  didaskal-os  paideuei to  paidi-on  (SVO)
the teacher -NOM teaches  the boy  -ACC</pre></td></tr></table></div>


<p>This sentence means &#8220;the teacher instructs the boy.&#8221; While sentence (5) is in Subject-Verb-Object order, any of the six possible orderings of {subject, verb, object} are also grammatical and mean the same thing:<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>7
8
9
10
11
</pre></td><td class="code"><pre class="ancient-greek" style="font-family:monospace;">ho didaskalos to paidion paideuei (SOV)
paideuei ho didaskalos to paidion (VSO)
paideuei to paidion ho didaskalos (VOS)
to paidion ho didaskalos paideuei (OSV)
to paidion paideuei ho didaskalos (OVS)</pre></td></tr></table></div>


<p>Many languages also use <a href="http://en.wikipedia.org/wiki/adposition">adpositions</a> (prepositions and/or postpositions) to further clarify the role of an argument in addition to case (like English does) or in lieu of case marking altogether. The idea is the same, though: you want to clarify the roles of the arguments so you morphologically mark the arguments with their roles.</p>

<h3>Marking the verb</h3>

<p>Many languages mark the verb with some information about the argument in a certain role, so that we can properly identify the argument&#8217;s roles. This kind of phenomenon is called <em>agreement</em>.</p>

<p>The most common type of verbal agreement is subject agreement, where the verb is marked by a specific form depending on some features of the subject. Anyone who&#8217;s taken French 101 will recognize this verb conjugation paradigm:</p>

<table>
<tr><th></th><th>subject</th><th>être (to be)</th></tr>
<tr><td rowspan='3'>singular</td><td>je (I)</td><td>suis</td></tr>
<tr><td>tu (you)</td><td>es</td></tr>
<tr><td>il/elle (he/she)</td><td>est</td></tr>
<tr><td rowspan='3'>plural</td><td>nous (we)</td><td>sommes</td></tr>
<tr><td>vous (plural you)</td><td>êtes</td></tr>
<tr><td>ils (they)</td><td>sont</td></tr>
</table>

<p>With this paradigm, if you hear or see &#8220;suis&#8221; in a French sentence, you immediately know that &#8220;je&#8221; (<em>I</em>) must be the subject and if you see &#8220;sommes,&#8221; &#8220;nous&#8221; (<em>we</em>) is the subject, etc. <a href="http://en.wikipedia.org/wiki/Standard Average European">Standard Average European</a> languages tend to exhibit this sort of subject-verb agreement.</p>

<p>Features of the subject position aren&#8217;t the only thing that can be marked on the verb, though. Hungarian, for example, has a type of object agreement. Specifically, the verb marks whether the object is definite or not (in linguistics lingo, &#8220;the verb agrees with the object&#8217;s definiteness feature&#8221;).</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
14
15
</pre></td><td class="code"><pre class="hungarian" style="font-family:monospace;">John lát  egy almát.
John sees an  apple
John látja az  almát.
John sees  the apple</pre></td></tr></table></div>


<p>Notice that in sentence (12) (glossed in (13)) the verb for &#8220;see&#8221; is realized as &#8220;lát,&#8221; while in (14) it&#8217;s &#8220;látja.&#8221; A speaker can use that agreement to see whether the object is definite or not and thus limit the possible object arguments out of all the nouns in the sentence.</p>

<h3>All of the above</h3>

<p><a href='http://www.qwantz.com/'><img src="http://mitcho.com/blog/wp-content/uploads/2009/02/whom.gif" alt="whom.gif" border="0" width="650" height="442" /></a></p>

<p>Most languages do not use only one of these strategies, but a combination of them. English is a very good example. In a sentence like (12) below the main coding of grammatical roles seems to be word order alone. By reversing the word order into (13), we can effectively swap the argument&#8217;s roles.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>12
13
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes Mary.
Mary likes John.</pre></td></tr></table></div>


<p>However, this doesn&#8217;t work with pronominal arguments. Swapping the arguments in (14) yields (15) which is ungrammatical due to the case marking on the pronouns.</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>14
15
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">He likes her.
* Her likes he.</pre></td></tr></table></div>


<p>In addition, the verb in English must agree with the subject&#8217;s number (singular or plural):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>16
17
18
</pre></td><td class="code"><pre class="english" style="font-family:monospace;">John likes them.
* They likes John.
They like John.</pre></td></tr></table></div>


<p>In this way, English exhibits all three strategies: word order, case marking, and agreement, although often only word order is actively used to disambiguate the roles of arguments.</p>

<p><strong>Question:</strong> What strategies are used by your language to mark the roles of different arguments?</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:2">
<p>The following example is from <a href="http://www.personal.uni-jena.de/~x4diho/LingTyp%20Grammatical%20relations.ppt">Holger Diessel</a>.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>&#8220;Mean the same thing&#8221; here means that the teacher is always instructing and the boy is always being instructed. The sentences may differ in when or how they are used depending on which argument is being talked about or what the implications of the utterance are. The formal notion is <em>truth-conditional equivalence</em>.&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>
<p>Related posts:</p><ol>
<li><a href='http://mitcho.com/blog/observation/%e5%8f%8e%e9%9b%86-vs-%e5%9b%9e%e5%8f%8e-and-better-word-meanings-through-usage/' rel='bookmark' title='回収 vs. 収集 and Better Word Meanings Through Usage'>回収 vs. 収集 and Better Word Meanings Through Usage</a></li>
<li><a href='http://mitcho.com/blog/observation/testing-googles-language-detection/' rel='bookmark' title='Testing Google&#8217;s Language Detection'>Testing Google&#8217;s Language Detection</a></li>
<li><a href='http://mitcho.com/blog/observation/gaba-shame-on-you/' rel='bookmark' title='Gaba, Shame On You'>Gaba, Shame On You</a></li>
</ol>
<p>Related posts brought to you by <a href='http://yarpp.org'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/three-ways-to-argue-over-arguments/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

