Ubiquity i18n: questions to ask

Mar 23, 2009

I recently have traveled a fair deal and have met many people excited about the Ubiquity project and its localization efforts. “I want to help,” say the people, but many are unsure where to start.

As a linguist, studying a language involves looking at instances of that language as data. To this end, we as Ubiquity internationalizers need to get at some examples of target utterances. Here’s an example survey which could be a good starting point for native speakers who want to contribute information on their language, based on Blair’s list of common Ubiquity verbs.

A survey for Ubiquity localization

Instructions

How would you express the following commands in your language? The words in CAPITAL LETTERS do not need to be translated. Feel free to give multiple possible answers for each command.

Try to express the same command rather than forcing a “literal translation”; for example, if there’s no “map” verb in your language, you could translate example (8) as lookup a map of PLACE. Please keep in mind that the [[addressee]] is a computer.

Basic word order / argument structure

search HELLO
search HELLO with google
translate HELLO from English to French
lookup the weather for PLACE
shop for SHOES with Amazon
email HELLO to Bill
email HELLO to ADDRESS
map PLACE
find HELLO
tab to HELLO or switch to HELLO tab

…

Pronominal/deictic arguments (aka “magic words”)

search this with google
translate this to French
bookmark this tab

…

How this data is used

Responses to these surveys would be used to identify certain salient features of the language, such as how the language codes for its arguments (for example using [[adpositions]], [[case marking]], or word order), whether the commands tend to be verb-inital or -final. Individual case markings, for example, can be identified by comparing [[minimal pairs]]—for example, by comparing item (1) and (2), we can learn how google in an instrumental role is marked, or by comparing example (2) and the “magic word” example (1), we can identify the appropriate “magic word” and determine whether the language uses any [[clitics]] or not.

Data collection

In the future we ideally could build a web-based system to collect these “utterances.” We could also use such a system to automatically test our parsers in different languages against the sentences in the command-bank, or ultimately even generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.