blog

Posts Tagged ‘ubiquity’

Notes from BarCamp Tokyo 2009

Monday, May 18th, 2009

This past Saturday was Tokyo BarCamp 2009 at Sun’s Yoga offices. I of course gave a presentation on Ubiquity and our recent localization efforts, including Parser 2. As you can see, I signed up quickly:

ubiquity-wall-650.jpg
CC-BY-NC iMorpheus

Here are the slides I used in that session. There are two “demo” sections in the slides… the first was a simple demo of Ubiquity 0.1.x showing off the translate, map, and edit-page commands. The second demo was of Ubiquity Parser 2 and showing off how little code it takes to add your language to Ubiquity with Parser 2.

(more…)

Ubiquity in Italian!

Monday, May 18th, 2009

Thanks to the great work of Sandro Della Giustina, we now have a preliminary Italian parser for use with Ubiquity Parser 2. Sandro brought up a good point, however, about Italian prepositions which contract with the article and the head noun. For example,

traduci   dall'inglese     al     cinese
translate from=the=English to=the Chinese

One current solution is to add zero-width spaces after these contracted articles, all’ and dall’.1 The appropriate way to add this to the parser is by defining a custom wordBreaker() method.

it._patternCache.contractionMatcher = new RegExp('(^| )(all\'|dall\')','g');
it.wordBreaker = function(input) {
  return input.replace(this._patternCache.contractionMatcher,'$1$2\u200b');
};

Grazie Sandro!


  1. As John Daggett pointed out to me, in the future we may have to add an intermediate shallow parse instead of adding characters (in this case, the zero-width space) to the modified input. 

Inside the Argument

Wednesday, May 13th, 2009

Here’s a little picture of the different sections of text in a single parsed argument and which properties of the resultant argument object they are assigned to.

insidetheargument.jpg

You’ll see, from left to right, outerSpace, modifier, innerSpace, inactivePrefix, input/data, inactiveSuffix.

The example text is from the Catalan example, “compra mitjons amb el Google,” meaning “buy socks with Google.” You’ll notice the argument “amb el Google” is literally “with the Google.” The normalizeArgument() method of the Catalan parser, as I described earlier this week, strips the article “el ” and puts it in the inactivePrefix property of the argument.

I’m going to spend the rest of the day updating Parser 2 design doc and related documentation so they match these and other recent developments in the parser.

Solving Another Romantic Problem: Weak Pronouns

Tuesday, May 12th, 2009

Yesterday I blogged on how to deal with portmanteau’ed prepositions in Ubiquity Parser 2, a common problem in various romance languages. Today I’ll propose an approach to another romantic problem.

The problem:

Weak pronouns in romance languages (as well as some other languages) have a special property where they cliticize to the verb, moving from its regular argument position to a position next to the verb. For example, in French, we have an imperative like (1) with gloss as (2):

1
2
Envoyez  le  lettre à  Pierre!
send.IMP the letter to Pierre

If we replace le lettre or à Pierre with a preposition (le, “it”, or lui, “to him”, respectively), those weak pronouns move next to the verb—in particular, (5) exemplifies the change in word order. Replacing both arguments with prepositions creates the stacked clitic form of (7).1

3
4
5
6
7
8
Envoyez-la à  Pierre!
send   -it to Pierre
Envoyez-lui la  lettre!
send   -him the letter
Envoyez-le-lui!
send   -it-him

The fact that these weak pronouns are attached to the verb and lack separate delimiters mean that we will need a separate mechanism to parse these arguments: indeed, this functionality has been planned in Ubiquity Parser 2 as “step 3”. Here I’ll examine some data and discuss a strategy for the parsing of weak pronouns.

(more…)


  1. Note that the reverse order of “Envoyez-lui-le” is ungrammatical… fortunately we most likely will not have to deal with multiple clitics… see footnote two below. 

Solving a Romantic Problem: Portmanteau’ed Prepositions

Monday, May 11th, 2009

The problem:

In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau’ed prepositions and articles… I refer you to the contraction article on Wikipedia.

As I noted a couple weeks ago, however, some combinations do not form portmanteaus.2

(more…)


  1. Thanks to Jeremy O’Brien for helping me figure out how to refer to this phenomenon. 

  2. This also relates to the issue of parsing multi-word delimiters, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters. 

In Case of Case…

Wednesday, May 6th, 2009

A recently hot topic of discussion in the Ubiquity i18n realm has been how to deal with strongly case-marking languages. As we continue to make steady progress, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2.

Introduction

Grammatical case is a marking on nouns that express grammatical function. Not all languages exhibit case. In many of the Indo-European languages we hope to bring Ubiquity to, case is realized as a suffix.1

Here’s a classic example of case from Latin. (Line 2 is the gloss of 1, line 4 of 3.)

1
2
3
4
canis      virum      momordit
dog=sg.NOM man=sg.ACC bite=3sg.perfect
vir        canem      momordit
man=sg.NOM dog=sg.ACC bite=3sg.perfect

Example (1) is “the man bit the dog,” while example (3) is “the dog bit the man.” The only difference, as you see in the gloss, is that the nouns canis and vir are marked with different case endings in the two sentences. By marking the nouns with different cases (here, nominative and accusative), their semantic roles in the sentence—which is the the biter and which is the bitee—can be identified unambiguously. (Their positions are also switched in these examples but in reality Latin has a very free word order—the same sentences with other word orders including OSV or VSO are also common.)

At first glance, strongly case-marked languages may look like a godsend for identifying the semantic roles of arguments.2 If we can easily and unambiguously recognize arguments’ cases to put them in their appropriate semantic roles, this could simplify processing as well as make Ubiquity input follow a natural syntax for such languages. Unfortunately, there are some significant challenges which must be overcome in order to make the processing of case-markers worthwhile.

(more…)


  1. Note that when linguists talk about “case,” they could be referring to two different (though related) concepts: case (lowercase) is the observed pattern of affixes on nouns which indicate grammatical function, while Case (uppercase) refers to a theoretical (formal) feature of syntactic objects—certain lexical items “assign Case” or “receive Case” and its mismatches were ruled out in GB syntax by the Case Filter. You’ll find GB linguistics papers referring to “case” when discussing Mandarin Chinese, for example, a language that doesn’t have any overt case (lowercase) and you’ll know immediately that this usage is an uppercase Case case. In this blog post I’ll be dealing primarily with the former descriptive notion. 

  2. When I refer to “strongly case-marking languages,” I am referring to languages with a non-trivial inventory of cases (not just nominative, accusative, and genitive) and where a noun phrase’s case is not reflected on determiners. For example, German is excluded by this definition as case is realized exclusively on articles and there is no need to find and parse the noun head itself to identify its case—more information on German is in the section “finding the edges.” 

Adding Your Language to Ubiquity Parser 2

Wednesday, April 29th, 2009

NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions.

You’ve seen the video. You speak another language. And you’re wondering, “how hard is it to add my language to Ubiquity with Parser 2?” The answer: not that hard. With a little bit of JavaScript and knowledge of and interest in your own language, you’ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please submit your (even incomplete) language files!

As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the Ubiquity Planet and/or this blog (RSS).

(more…)

Command Chaining with Oni?

Wednesday, April 29th, 2009

There are two challenges to implementing so-called command chaining, but only one of them is choosing a linguistically appropriate structural standard and parsing it. The other is the underlying difficulty of processing each individual “clause” in sequence, asynchronously. Alex Fritze blogged about how a project like his own Oni could dramatically simplify this underlying process.

Ubiquity, Oni, and Composability:

but I cannot instruct it to give me list of translated google results:
translate (google foo) to German  // doesn't work
Or email me the resulting list:
email(translate (google foo) to German) // doesn't work
…So how does Oni relate to this? Oni is a browser-based “embedded structured concurrency framework”. It allows you to write asynchronous code as if it was synchronous, adding back the kind-of composibility that is lost when juggling concurrent strands of execution (such as e.g. pending XMLHttpRequests) with ‘conventional’ sequential languages.

Scoring for Optimization

Friday, April 24th, 2009

Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate’s score. You’re only interested in the top n candidates. You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier without sacrificing quality. Such is the problem of scoring and ranking suggestions in Ubiquity. What properties must such a scoring system have?

This blog post includes a lot of complex CSS-formatted graphs which may be best viewed in — what else? — Firefox. You may also want to access this blog post directly rather than through a planet.

candidate 8  
candidate 2  
candidate 9  
candidate 3  
candidate 10 CUTOFF
candidate 5 
candidate 1 
candidate 7 
  

One portion of the problem description above merits clarification: I define “without sacrificing quality” to mean that, if we did not throw out any candidates early and waited until all the scores are computed fully and accurately, we would still yield the same top n winners. This already gives us the key insight towards an appropriate solution: we can only throw out candidates when we know that it has no further chance of making it up into top n candidates.

(more…)

A Demonstration of Ubiquity Parser 2

Friday, April 24th, 2009

Here’s a quick demonstration of Ubiquity Parser 2, aka “the new parser.” I’ll show you how you can use the parser yourself and point out some highlights of the new functionality.


Ubiquity Parser 2: better noun-first suggestions and command localization from mitcho on Vimeo.

(more…)

Attachment Ambiguity—or—when is the gyudon cheap?

Wednesday, April 15th, 2009

yoshinoya.jpg

Every day on the way to work I walk by a fine establishment known as Yoshinoya (吉野家), Japan’s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by gyudon, it’s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.

Part of this sign, though, made me think about our new Ubiquity parser. In particular, it was the attachment ambiguity in the end date of the promotion. The text in the photo above literally is “April 15th (Wed.) 8PM until”. (Note that Japanese is a strongly head-final language, and that the “until” is a postposition.) There are two possible readings for this expression, as illustrated by the two composition trees below.

(more…)

Count command for Ubiquity

Monday, April 13th, 2009

(This is primarily a blog post to test out Sandro’s plugin for embedding Ubiquity commands in WordPress. If you don’t see the “subscribe to command” come up, make sure you’re looking at the single page view.)

A while back I created a count command for Ubiquity to count HTML elements on a page, so I’ll share it here. The idea is super simple: select some text on your page and execute count p to get the number of paragraphs, or count a to get the number of links, or count tr to get the number of table rows. This is super useful when reading articles with charts or lists online and you want to know how many there are without doing something like copy-pasting into Excel.

The count command is built using jQuery so it can even understand targets like p.class or a[href=...]. Give it a try! ^^

Rolling out the Roles

Thursday, April 9th, 2009

Jono and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands with semantic roles. The first step, however, is to come up with a list of universal semantic roles which the verbs will be rewritten to use and individual languages’ parsers will be built to identify. Today I have just such a proposal.

(more…)

Scoring and Ranking Suggestions

Tuesday, April 7th, 2009

I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I’d put some of these thoughts down in writing.

The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four “scores” for each suggestion:

(more…)

Foxkeh demos Ubiquity Parser: The Next Generation

Wednesday, April 1st, 2009

I just made a screencast with Foxkeh to demo the Ubiquity next generation parser demo and to demonstrate how easy it is to add your own language. Foxkeh wants you to localize the parser into your language. How could you say no? ^^


Foxkeh demos Ubiquity Parser: The Next Generation from mitcho on Vimeo.

There are some details which are not covered in this introductory video, such as how to deal with case marking languages or languages without spaces. Hopefully this’ll inspire some people to play with the demo, though. I’d love to hear your comments! ^^


© 2006-2010 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress.
Entries (RSS) and Comments (RSS).
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.