<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mitcho.com &#187; Portuguese</title>
	<atom:link href="http://mitcho.com/blog/tag/portuguese/feed/" rel="self" type="application/rss+xml" />
	<link>http://mitcho.com</link>
	<description></description>
	<lastBuildDate>Thu, 29 Jul 2010 19:14:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Solving Another Romantic Problem: Weak Pronouns</title>
		<link>http://mitcho.com/blog/projects/solving-another-romantic-problem/</link>
		<comments>http://mitcho.com/blog/projects/solving-another-romantic-problem/#comments</comments>
		<pubDate>Tue, 12 May 2009 08:09:31 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[Catalan]]></category>
		<category><![CDATA[French]]></category>
		<category><![CDATA[Italian]]></category>
		<category><![CDATA[Modern Greek]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Portuguese]]></category>
		<category><![CDATA[romance languages]]></category>
		<category><![CDATA[Spanish]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=2035</guid>
		<description><![CDATA[Yesterday I blogged on how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem. The problem: Weak pronouns in romance languages (as well as some other languages) have a special property where they cliticize to the verb, moving from [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Permanent Link: Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Permanent Link: Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p><em>Yesterday I blogged on <a href="http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/">how to deal with portmanteau&#8217;ed prepositions in Ubiquity Parser 2</a>, a common problem in various romance languages. Today I&#8217;ll propose an approach to another romantic problem.</em></p>

<h3>The problem:</h3>

<p>Weak pronouns in <a href="http://en.wikipedia.org/wiki/romance languages">romance languages</a> (as well as some other languages) have a special property where they <em>cliticize</em> to the verb, moving from its regular argument position to a position next to the verb. For example, in French, we have an imperative like (1) with gloss as (2):</p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez  le  lettre à  Pierre!
send.IMP the letter to Pierre</pre></td></tr></table></div>


<p>If we replace <em>le lettre</em> or <em>à Pierre</em> with a preposition (<em>le</em>, &#8220;it&#8221;, or <em>lui</em>, &#8220;to him&#8221;, respectively), those weak pronouns move next to the verb—in particular, (5) exemplifies the change in word order. Replacing both arguments with prepositions creates the stacked clitic form of (7).<sup id="fnref:3"><a href="#fn:3" rel="footnote">1</a></sup></p>


<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
5
6
7
8
</pre></td><td class="code"><pre class="fr" style="font-family:monospace;">Envoyez-la à  Pierre!
send   -it to Pierre
Envoyez-lui la  lettre!
send   -him the letter
Envoyez-le-lui!
send   -it-him</pre></td></tr></table></div>


<p>The fact that these weak pronouns are attached to the verb and lack separate delimiters mean that we will need a separate mechanism to parse these arguments: indeed, this functionality has been planned in <a href="https://wiki.mozilla.org/Labs/Ubiquity/Parser_2">Ubiquity Parser 2</a> as &#8220;step 3&#8221;. Here I&#8217;ll examine some data and discuss a strategy for the parsing of weak pronouns.</p>

<p><span id="more-2035"></span></p>

<h3>Weak pronouns in Ubiquity</h3>

<p>In Ubiquity the only pronoun we currently deal with is the deictic <code>object</code>-role anaphor, like &#8220;it,&#8221; &#8220;this,&#8221; etc. in English.<sup id="fnref:1"><a href="#fn:1" rel="footnote">2</a></sup> In addition, as these weak pronoun clitics cannot by definition be embedded within a larger noun phrase, its referent would constitute the entire <code>object</code> argument. As such, it is most logical to place clitic handling before argument structure parsing and simply hand the argument parser the argument string without the clitic.</p>

<h3>Marking the clitic</h3>

<p>We can classify languages with cliticized weak pronouns into two cases based on their processing considerations: languages that overtly mark the clitic and those which do not.</p>

<h4>Languages which delimit the clitic</h4>

<p>Some languages such as French (see above) clearly mark the boundary between the verb and the clitic. It will be relatively easy to parse weak pronouns in such languages as we can simply <a href="http://ubiquity.mozilla.com/trac/ticket/665">insert a no-width space</a> between the verb and the clitic. A list of clitics can then be designated in the parser (much like anaphora are now) and these weak pronouns can be interpreted as the selection (or &#8220;this&#8221;-referent).<sup id="fnref:2"><a href="#fn:2" rel="footnote">3</a></sup></p>

<p><strong>Portuguese:</strong> (from <a href="http://email.eva.mpg.de/~cysouw/pdf/cysouwDGFS.pdf">Cysouw 2003</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="pr" style="font-family:monospace;">Come-o
eat -it</pre></div></div>


<p><strong>Catalan:</strong> (from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Cerca-ho
search-it</pre></div></div>


<p><strong>Modern Greek:</strong> (from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>; I know, I know, Greek&#8217;s not a romance language, but it has weak pronoun clitics too&#8230; it&#8217;s all good.)</p>

<p>Modern Greek actually inserts a space between the verb and weak pronouns.</p>


<div class="wp_syntax"><div class="code"><pre class="ca" style="font-family:monospace;">Diavase to
read   -it</pre></div></div>


<h4>Languages which do not delimit the clitic</h4>

<p>Some languages do not insert any delimiter between the verb and the weak pronoun, essentially entering them as a single word (in the string sense, at least). These cases may be more difficult to parse, especially as there may be sound changes to the verb stem itself.</p>

<p><strong>Italian:</strong> (first example from <a href="http://books.google.com/books?id=tnXJVbGpMfEC">Kayne 1994</a>)</p>

<p>Italian is a case where some verbs actually conjoin with the verb in imperatives, much like their prepositions which I noted yesterday have an elaborate system of portmanteau&#8217;ed forms.</p>


<div class="wp_syntax"><div class="code"><pre class="it" style="font-family:monospace;">Fallo
do-it
Mangialo
eat  -it</pre></div></div>


<p><strong>Spanish:</strong> (first example from <a href="http://aix1.uottawa.ca/~romlab/pubs/RiveroTerzi.1995.pdf">Rivero and Terzi 1995</a>, second from <a href="http://www.cau.cat/blog/">Toni Hermoso Pulido</a>)</p>

<p>Spanish is the same way:</p>


<div class="wp_syntax"><div class="code"><pre class="es" style="font-family:monospace;">Léelo
read-it
Búscalo
search-it</pre></div></div>


<h3>Displaying the suggestion</h3>

<p>The current Ubiquity handling of anaphora (aka &#8220;magic words&#8221;) involves a display of the selection (replacement) text in a stylized way. One problem with clitics may be how to visually present this replacement to the user.</p>

<p><center><img src="http://mitcho.com/blog/wp-content/uploads/2009/05/picture-11.png" alt="Picture 1.png" border="0" width="284" height="160" /></center></p>

<p>For languages with a delimiter such as French we could simply present the selection as an object right after the verb, without the hyphen.</p>

<table>
<tr><th>input:</th><td>traduisez-le (translate-it)</td></tr>
<tr><th>suggestion:</th><td>traduisez <span style='  padding: 2px;
  -moz-border-radius: 3px;
  display: inline-block;
  font-variant: small-caps;
  background-color: #BBB;
  color: #333;
  position: relative;
  top: -2px;
  font-size: 8pt;
  font-weight: normal;
  border: 1px solid #777;'>selection</span></td></tr>
</table>

<p>Things may be more complicated, however, in languages where the clitic is not delimited from the verb, or where the verb form itself has changed due to the attachment of the clitic.</p>

<h3>Conclusion</h3>

<p>In this blog post I&#8217;ve tried to lay out some of the weak pronoun phenomena relevant to Ubiquity with some ideas on how to implement its parsing. I believe parsing weak pronouns should be relatively straightforward in languages with delimiters—for those which do not have delimiters, some creativity may be required in how building regular expressions or rules to detect the clitics and in presenting these suggestions to the user.</p>

<p><strong>Does your language have weak pronoun clitics? What do you think will be the challenges in trying to parse these arguments?</strong></p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:3">
<p>Note that the reverse order of &#8220;Envoyez-lui-le&#8221; is ungrammatical&#8230; fortunately we most likely will not have to deal with multiple clitics&#8230; see footnote two below.&#160;<a href="#fnref:3" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:1">
<p>This is not so much an informed decision that we should not do different kinds of anaphors but simply that we haven&#8217;t gotten around to implementing it. I personally am not sure, however, whether there is a real need for parsing for anaphors for roles other than <code>object</code> (for example, French <em>lui</em> as seen above which would be a <code>goal</code> anaphor).&#160;<a href="#fnref:1" rev="footnote">&#8617;</a></p>
</li>

<li id="fn:2">
<p>There is, however, a question of whether weak pronoun replacement should be obligatory or not: that is, if we see a regular anaphor right now such as &#8220;this,&#8221; we make two copies of the parse: one with the replacement, one without. In the case where we detect an anaphor, should the replacement be obligatory? I believe it should be, though, as with many other Parser 2 features, I believe we can continue to parse other options with no replacement but let the scoring system kill those parses off. If a verb has a clitic attached to it but we do not remove it, it most likely will do very poorly in scoring anyway.&#160;<a href="#fnref:2" rev="footnote">&#8617;</a></p>
</li>

</ol>
</div>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/solving-a-romantic-problem-portmanteaued-prepositions/' rel='bookmark' title='Permanent Link: Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions'>Solving a Romantic Problem: Portmanteau&#8217;ed Prepositions</a></li>
<li><a href='http://mitcho.com/blog/observation/wheres-the-verb/' rel='bookmark' title='Permanent Link: Where&#8217;s The Verb?'>Where&#8217;s The Verb?</a></li>
<li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/projects/solving-another-romantic-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ubiquity in Portuguese</title>
		<link>http://mitcho.com/blog/link/ubiquity-in-portuguese/</link>
		<comments>http://mitcho.com/blog/link/ubiquity-in-portuguese/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 07:49:33 +0000</pubDate>
		<dc:creator>mitcho</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mozilla Planet]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[Portuguese]]></category>
		<category><![CDATA[ubiquity]]></category>

		<guid isPermaLink="false">http://mitcho.com/blog/?p=1547</guid>
		<description><![CDATA[Felipe, a Ubiquity user, put together a wonderful look at what Ubiquity might look like in Portuguese. He has some great points here particularly regarding the &#8220;map&#8221; verb used in English—Felipe points out that Portuguese does not have a very common &#8220;map&#8221; verb and that it would be much more common to use enter me [...]


Related posts:<ol><li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Permanent Link: Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Permanent Link: Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol>

Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.]]></description>
			<content:encoded><![CDATA[<p>Felipe, a Ubiquity user, put together a wonderful look at <a href="http://felipe.wordpress.com/2009/03/03/thinking-ubiquity-in-portuguese/">what Ubiquity might look like in Portuguese</a>. He has some great points here particularly regarding the &#8220;map&#8221; verb used in English—Felipe points out that Portuguese does not have a very common &#8220;map&#8221; verb and that it would be much more common to use enter <code>me dê</code> (literally <em>me give</em>) to use a verb to <em>request</em> a map. This is a great example of how Jono&#8217;s <a href="http://jonoscript.wordpress.com/2009/01/24/overlord-verbs-a-proposal/">overlord verbs proposal</a> may be an important aspect of our i18n efforts. The post is also timely as we&#8217;ve recently been discussing in our <a href="https://wiki.mozilla.org/Labs/Ubiquity/Meetings">regular meetings</a> (open to all!) that Portuguese may/could be the focus of our next parser construction efforts.</p>

<p><strong>What would the challenges be for Ubiquity in your language?</strong> We&#8217;d love to see an increasing number of blog posts on this topic in different languages. Thanks Felipe! ^^</p>


<p>Related posts:<ol><li><a href='http://mitcho.com/blog/projects/this-week-on-ubiquity-parser-the-next-generation/' rel='bookmark' title='Permanent Link: This week on Ubiquity Parser: The Next Generation'>This week on Ubiquity Parser: The Next Generation</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-commands-and-nountypes/' rel='bookmark' title='Permanent Link: Localizing Ubiquity: commands and nountypes'>Localizing Ubiquity: commands and nountypes</a></li>
<li><a href='http://mitcho.com/blog/projects/localizing-ubiquity-an-open-letter-to-linguists/' rel='bookmark' title='Permanent Link: Localizing Ubiquity: an open letter to linguists'>Localizing Ubiquity: an open letter to linguists</a></li>
</ol></p>
<p>Related posts brought to you by <a href='http://mitcho.com/code/yarpp/'>Yet Another Related Posts Plugin</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://mitcho.com/blog/link/ubiquity-in-portuguese/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
