blog

Posts Tagged ‘Catalan’

Inside the Argument

Wednesday, May 13th, 2009

Here’s a little picture of the different sections of text in a single parsed argument and which properties of the resultant argument object they are assigned to.

insidetheargument.jpg

You’ll see, from left to right, outerSpace, modifier, innerSpace, inactivePrefix, input/data, inactiveSuffix.

The example text is from the Catalan example, “compra mitjons amb el Google,” meaning “buy socks with Google.” You’ll notice the argument “amb el Google” is literally “with the Google.” The normalizeArgument() method of the Catalan parser, as I described earlier this week, strips the article “el ” and puts it in the inactivePrefix property of the argument.

I’m going to spend the rest of the day updating Parser 2 design doc and related documentation so they match these and other recent developments in the parser.

Solving Another Romantic Problem: Weak Pronouns

Tuesday, May 12th, 2009

Yesterday I blogged on how to deal with portmanteau’ed prepositions in Ubiquity Parser 2, a common problem in various romance languages. Today I’ll propose an approach to another romantic problem.

The problem:

Weak pronouns in romance languages (as well as some other languages) have a special property where they cliticize to the verb, moving from its regular argument position to a position next to the verb. For example, in French, we have an imperative like (1) with gloss as (2):

1
2
Envoyez  le  lettre à  Pierre!
send.IMP the letter to Pierre

If we replace le lettre or à Pierre with a preposition (le, “it”, or lui, “to him”, respectively), those weak pronouns move next to the verb—in particular, (5) exemplifies the change in word order. Replacing both arguments with prepositions creates the stacked clitic form of (7).1

3
4
5
6
7
8
Envoyez-la à  Pierre!
send   -it to Pierre
Envoyez-lui la  lettre!
send   -him the letter
Envoyez-le-lui!
send   -it-him

The fact that these weak pronouns are attached to the verb and lack separate delimiters mean that we will need a separate mechanism to parse these arguments: indeed, this functionality has been planned in Ubiquity Parser 2 as “step 3”. Here I’ll examine some data and discuss a strategy for the parsing of weak pronouns.

(more…)


  1. Note that the reverse order of “Envoyez-lui-le” is ungrammatical… fortunately we most likely will not have to deal with multiple clitics… see footnote two below. 

Solving a Romantic Problem: Portmanteau’ed Prepositions

Monday, May 11th, 2009

The problem:

In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau’ed prepositions and articles… I refer you to the contraction article on Wikipedia.

As I noted a couple weeks ago, however, some combinations do not form portmanteaus.2

(more…)


  1. Thanks to Jeremy O’Brien for helping me figure out how to refer to this phenomenon. 

  2. This also relates to the issue of parsing multi-word delimiters, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters. 


© 2006-2010 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress.
Entries (RSS) and Comments (RSS).
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.