Ubiquity in Italian!
Monday, May 18th, 2009Thanks to the great work of Sandro Della Giustina, we now have a preliminary Italian parser for use with Ubiquity Parser 2. Sandro brought up a good point, however, about Italian prepositions which contract with the article and the head noun. For example,
traduci dall'inglese al cinese translate from=the=English to=the Chinese
One current solution is to add zero-width spaces after these contracted articles, all’ and dall’.1 The appropriate way to add this to the parser is by defining a custom wordBreaker() method.
it._patternCache.contractionMatcher = new RegExp('(^| )(all\'|dall\')','g'); it.wordBreaker = function(input) { return input.replace(this._patternCache.contractionMatcher,'$1$2\u200b'); };
Grazie Sandro!
-
As John Daggett pointed out to me, in the future we may have to add an intermediate shallow parse instead of adding characters (in this case, the zero-width space) to the modified input. ↩