mitcho Michael 芳貴 Erlewine

Postdoctoral fellow, McGill Linguistics.


Solving a Romantic Problem: Portmanteau’ed Prepositions

The problem:

In many [[romance languages]], prepositions and articles often form [[portmanteau|portmanteau morphs]], combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau’ed prepositions and articles… I refer you to the [[Contraction (grammar)#Italian|contraction]] article on Wikipedia.

As I noted a couple weeks ago, however, some combinations do not form portmanteaus.2


  1. à + le > au
  2. à + la > à la

The problem with this is that if we use both à and au as delimiters, we may end up passing the definite article to the verb as part of the argument in some cases, but not in other cases.

  1. à la table” = “to the table”
  2. au chat” = “to the cat”

The solution:

The solution is a new step in the Parser 2 process which normalizes the form of arguments. Each language’s parser can now optionally define a normalizeArgument() method which takes an argument and returns a list of normalized alternates. Normalized arguments are returned in the form of {prefix: '', newInput: '', suffix: ''}. For example, if you feed “la table” to the French normalizeArgument(), it ought to return

[{prefix: 'la ', newInput: 'table', suffix: ''}]

If there are no possible normalizations, normalizeArgument() should simply return []. Each alternative returned by normalizeArgument() is substituted into a copy of the possible parses just before nountype detection. The prefixes and suffixes are stored in the argument (as inactivePrefix and inactiveSuffix) so they can be incorporated into the suggestion display.

Here, for example, is how the inactive prefix “l’” is displayed in the parser demo. This way the user is told that the “l’” prefix is being ignored, and the nountype detection and verb action can act on the argument “English”.3 (In the future, of course, we could teach this nountype to accept the Catalan “anglès”.)

Picture 1.png

The easiest way to produce this output is to use the String.match() method. For example normalizeArgument() code, I refer you to the Catalan and French parser files.

I hope that this solution will help make Ubiquity with Parser 2 feel more natural for many romance languages.

  1. Thanks to Jeremy O’Brien for helping me figure out how to refer to this phenomenon. 

  2. This also relates to the issue of parsing multi-word delimiters, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters. 

  3. Thank you to contributor Toni Hermoso Pulido for our first attempt at a Catalan parser! 

Tags: , , , , , , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed (optionally with tweets from my Twitter)!

  • Blair McBride

    Shouldn't that be an object, rather than an array?

  • mitcho

    Blair, are you referring to the output of normalizeArgument()? It may indeed make more sense as an object, although the current spec lets it trivially be the output of String.match(). Hmph.

    • mitcho

      Alright, upon discussion with Blair I'm updating the spec so that the output of normalizeArgument() is of the form {prefix: '', newInput: '', suffix: ''}. For example, {prefix: 'la ', newInput: 'table', suffix: ''}.