Ubiquity i18n: questions to ask

I recently have traveled a fair deal and have met many people excited about the Ubiquity project and its localization efforts. “I want to help,” say the people, but many are unsure where to start.

As a linguist, studying a language involves looking at instances of that language as data. To this end, we as Ubiquity internationalizers need to get at some examples of target utterances. Here’s an example survey which could be a good starting point for native speakers who want to contribute information on their language, based on Blair’s list of common Ubiquity verbs.

A survey for Ubiquity localization


How would you express the following commands in your language? The words in CAPITAL LETTERS do not need to be translated. Feel free to give multiple possible answers for each command.

Try to express the same command rather than forcing a “literal translation”; for example, if there’s no “map” verb in your language, you could translate example (8) as lookup a map of PLACE. Please keep in mind that the [[addressee]] is a computer.

Basic word order / argument structure

  1. search HELLO
  2. search HELLO with google
  3. translate HELLO from English to French
  4. lookup the weather for PLACE
  5. shop for SHOES with Amazon
  6. email HELLO to Bill
  7. email HELLO to ADDRESS
  8. map PLACE
  9. find HELLO
  10. tab to HELLO or switch to HELLO tab

Pronominal/deictic arguments (aka “magic words”)

  1. search this with google
  2. translate this to French
  3. bookmark this tab

How this data is used

Responses to these surveys would be used to identify certain salient features of the language, such as how the language codes for its arguments (for example using [[adpositions]], [[case marking]], or word order), whether the commands tend to be verb-inital or -final. Individual case markings, for example, can be identified by comparing [[minimal pairs]]—for example, by comparing item (1) and (2), we can learn how google in an instrumental role is marked, or by comparing example (2) and the “magic word” example (1), we can identify the appropriate “magic word” and determine whether the language uses any [[clitics]] or not.

Data collection

In the future we ideally could build a web-based system to collect these “utterances.” We could also use such a system to automatically test our parsers in different languages against the sentences in the command-bank, or ultimately even generate parser parameters based on those sentences. That would essentially reduce the parser-construction process to a more run-of-the-mill string translation process.