blog

Posts Tagged ‘ubiquity’

Report from SIGIR Workshop on Information Access in a Multilingual World

Friday, July 24th, 2009

Yesterday I participated in and presented at a workshop on Information Access in a Multilingual World at ACM SIGIR in Boston. The focus of the workshop was on cross-language information retrieval (CLIR). Cross-language information retrieval systems enable users to retrieve relevant information across different languages for a certain task or query. Even if you have a budget to translate some documents from a foreign language to your language, how do you find the relevant documents to translate in the first place if you don’t speak (or read) that source language? This is the type of problem that CLIR aims to solve.

(more…)

Ubiquity Localization: What’s New, What’s Next

Thursday, July 9th, 2009

Yesterday we released Ubiquity 0.5, a major update to the already popular Ubiquity platform. Among numerous other features, Ubiquity 0.5 includes the first fruit of months of research on building a multilingual parser and natural language interface. In this blog post I’ll give a quick overview of new internationalization-related features in Ubiquity 0.5 as well as a quick roadmap of future considerations.

Of course, one of the best ways to learn about the new features is to experience them… try Ubiquity 0.5 now!

Install now!

(more…)

Ubiquity presentation at Tokyo 2.0

Wednesday, June 10th, 2009

T2P0.PNG

This past Monday I presented at Tokyo 2.0, Japan’s largest bilingual web/tech community. I presented as part of a session on The Web and Language, which I also helped organize. Other presenters included Junji Tomita from goo Labs, Shinjyou Sunao of Knowledge Creation, developers of the Voice Delivery System API, and Chris Salzberg of Global Voices Online on community translation.

I just put together a video of my Ubiquity presentation, mixing the audio recorded live at the presentation together with a screencast of my slides for better visibility. The presentation is 10 minutes long and is bilingual, English and Japanese.


Ubiquity: Command the Web with Language 言葉で操作する Web from mitcho on Vimeo.

(more…)

Solving a Romantic Problem: Portmanteau’ed Prepositions

Monday, May 11th, 2009

The problem:

In many romance languages, prepositions and articles often form portmanteau morphs, combining to form a single word.1 Some examples include (French) à + le > au, de + le > du, (Catalan) a + el > al, de + les > dels, per + el > pel. Italian has a particularly productive system of portmanteau’ed prepositions and articles… I refer you to the contraction article on Wikipedia.

As I noted a couple weeks ago, however, some combinations do not form portmanteaus.2

(more…)


  1. Thanks to Jeremy O’Brien for helping me figure out how to refer to this phenomenon. 

  2. This also relates to the issue of parsing multi-word delimiters, though the argument normalization strategy covered here should reduce the necessity of multi-word delimiters. 

In Case of Case…

Wednesday, May 6th, 2009

A recently hot topic of discussion in the Ubiquity i18n realm has been how to deal with strongly case-marking languages. As we continue to make steady progress, this is one of remaining open questions which we must decide as a community how to tackle in Parser 2.

Introduction

Grammatical case is a marking on nouns that express grammatical function. Not all languages exhibit case. In many of the Indo-European languages we hope to bring Ubiquity to, case is realized as a suffix.1

Here’s a classic example of case from Latin. (Line 2 is the gloss of 1, line 4 of 3.)

1
2
3
4
canis      virum      momordit
dog=sg.NOM man=sg.ACC bite=3sg.perfect
vir        canem      momordit
man=sg.NOM dog=sg.ACC bite=3sg.perfect

Example (1) is “the man bit the dog,” while example (3) is “the dog bit the man.” The only difference, as you see in the gloss, is that the nouns canis and vir are marked with different case endings in the two sentences. By marking the nouns with different cases (here, nominative and accusative), their semantic roles in the sentence—which is the the biter and which is the bitee—can be identified unambiguously. (Their positions are also switched in these examples but in reality Latin has a very free word order—the same sentences with other word orders including OSV or VSO are also common.)

At first glance, strongly case-marked languages may look like a godsend for identifying the semantic roles of arguments.2 If we can easily and unambiguously recognize arguments’ cases to put them in their appropriate semantic roles, this could simplify processing as well as make Ubiquity input follow a natural syntax for such languages. Unfortunately, there are some significant challenges which must be overcome in order to make the processing of case-markers worthwhile.

(more…)


  1. Note that when linguists talk about “case,” they could be referring to two different (though related) concepts: case (lowercase) is the observed pattern of affixes on nouns which indicate grammatical function, while Case (uppercase) refers to a theoretical (formal) feature of syntactic objects—certain lexical items “assign Case” or “receive Case” and its mismatches were ruled out in GB syntax by the Case Filter. You’ll find GB linguistics papers referring to “case” when discussing Mandarin Chinese, for example, a language that doesn’t have any overt case (lowercase) and you’ll know immediately that this usage is an uppercase Case case. In this blog post I’ll be dealing primarily with the former descriptive notion. 

  2. When I refer to “strongly case-marking languages,” I am referring to languages with a non-trivial inventory of cases (not just nominative, accusative, and genitive) and where a noun phrase’s case is not reflected on determiners. For example, German is excluded by this definition as case is realized exclusively on articles and there is no need to find and parse the noun head itself to identify its case—more information on German is in the section “finding the edges.” 

Adding Your Language to Ubiquity Parser 2

Wednesday, April 29th, 2009

NOTE: This blog post has now been added to the Ubiquity wiki and is updated there. Please disregard this article and instead follow these instructions.

You’ve seen the video. You speak another language. And you’re wondering, “how hard is it to add my language to Ubiquity with Parser 2?” The answer: not that hard. With a little bit of JavaScript and knowledge of and interest in your own language, you’ll be able to get at least rudimentary Ubiquity functionality in your language. Follow along in this step by step guide and please submit your (even incomplete) language files!

As Ubiquity Parser 2 evolves, there is a chance that this specification will change in the future. Keep abreast of such changes on the Ubiquity Planet and/or this blog (RSS).

(more…)

Attachment Ambiguity—or—when is the gyudon cheap?

Wednesday, April 15th, 2009

yoshinoya.jpg

Every day on the way to work I walk by a fine establishment known as Yoshinoya (吉野家), Japan’s largest gyudon (牛丼) chain restaurant. For those of you whose lives have yet to be graced by gyudon, it’s a bowl of rice topped with beef and onions stewed in a sweet-savory soy-based sauce. Loving gyudon and being a cheapskate, I naturally noticed the recent 50 yen off gyudon promotion at Yoshinoya. The above photo is a photo of part of that sign.

Part of this sign, though, made me think about our new Ubiquity parser. In particular, it was the attachment ambiguity in the end date of the promotion. The text in the photo above literally is “April 15th (Wed.) 8PM until”. (Note that Japanese is a strongly head-final language, and that the “until” is a postposition.) There are two possible readings for this expression, as illustrated by the two composition trees below.

(more…)

Scoring and Ranking Suggestions

Tuesday, April 7th, 2009

I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I’d put some of these thoughts down in writing.

The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four “scores” for each suggestion:

(more…)

Where’s The Verb?

Wednesday, March 25th, 2009

Ubiquity’s proposed new parser design is based on a principles and parameters philosophy: we can build an underlying universal parser and, for each individual language, we simply set some “parameters” to tell the parser how to act. As we consider the design’s pros and cons, it’s important to reflect back on the linguistic data and see if this architecture can adequately handle the range of linguistic data attested in our languages.

Today I’ll examine highlight some disparate typological data to help us understand these questions: where’s the verb? and what does the verb look like? (more…)

Automating the Linguist’s Job

Tuesday, March 24th, 2009

At the end of my blog post yesterday I hinted at an exciting possible approach to Ubiquity’s localization:

In the future we ideally could build a web-based system to collect these “utterances.” We could … generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.

If we build this type of “command-bank” of common Ubiquity input translated into various languages, we could build a tool to learn various features of each language and generate each parser, essentially learning the language based on data. Today I’ll elaborate on how I believe this could be possible, by analogy to another language learning device: the human.

(more…)

Ubiquity i18n: questions to ask

Monday, March 23rd, 2009

I recently have traveled a fair deal and have met many people excited about the Ubiquity project and its localization efforts. “I want to help,” say the people, but many are unsure where to start.

As a linguist, studying a language involves looking at instances of that language as data. To this end, we as Ubiquity internationalizers need to get at some examples of target utterances. Here’s an example survey which could be a good starting point for native speakers who want to contribute information on their language, based on Blair’s list of common Ubiquity verbs.

(more…)

Ubiquity in Portuguese

Thursday, March 5th, 2009

Felipe, a Ubiquity user, put together a wonderful look at what Ubiquity might look like in Portuguese. He has some great points here particularly regarding the “map” verb used in English—Felipe points out that Portuguese does not have a very common “map” verb and that it would be much more common to use enter me dê (literally me give) to use a verb to request a map. This is a great example of how Jono’s overlord verbs proposal may be an important aspect of our i18n efforts. The post is also timely as we’ve recently been discussing in our regular meetings (open to all!) that Portuguese may/could be the focus of our next parser construction efforts.

What would the challenges be for Ubiquity in your language? We’d love to see an increasing number of blog posts on this topic in different languages. Thanks Felipe! ^^

Unnatural by design

Sunday, March 1st, 2009

I’m flying over the pacific ocean right now but a little bit of language caught my eye. Here’s a picture of the menu for this flight, in three languages: English, Japanese, Chinese.

menu.jpg

What caught my eye is the line “served with ご一緒に 配,” meant to be read as part of “Beef in BBQ sauce… served with Pepsi…”. The Chinese 配 (pèi) is fine here, meaning “with,” but the Japanese “ご一緒に” (goissho-ni) seemed awkward to me.

(more…)

Localizing Ubiquity: an open letter to linguists

Thursday, February 26th, 2009


Localizing Ubiquity: an open letter to linguists from mitcho on Vimeo.

Below is a transcript of this video. Please distribute this video far and wide to anyone who may be interested. ^^

(more…)

Ubiquity in Firefox: Focus on Japanese

Friday, February 20th, 2009

One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I’m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let’s kick this off with my mother tongue, Japanese.1

今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^ 日本語でのコメントも大歓迎です!

(more…)


© 2006–2011 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress on Media Temple.
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.