mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.

blog

Posts Tagged ‘Mandarin’

Spring is for Speaking: JSConf, WordCamp SF, IACL

Saturday, March 20th, 2010

I recently confirmed three different very exciting speaking gigs which I’ll be doing this spring:

(more…)

Exploring Command Chaining in Ubiquity: Part 2

Sunday, August 23rd, 2009

Introduction

I recently have begun giving serious thought to what command chaining might look like in Ubiquity and the various considerations which must be made to make it happen. The “command chaining,” or “piping,” described here always involves (at least) two verbs acting sequentially on a passed target—that is, the first command performs some action or lookup and the second command acts on the first command’s output.

A few days ago I penned some initial technical considerations regarding command chaining. In this post I’ll be point out some linguistic considerations involved in supporting a natural syntax for chaining.

(more…)

Three ways to argue over arguments

Wednesday, February 18th, 2009

UPDATE: Contribute information on how your language identifies its arguments here.

When we execute a command in Ubiquity, in very simple terms, we’re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:

  1. He sees Mary.
  2. 彼が Maryを 見る。 (Kare-ga Mary-o miru.)

As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). How do different languages code for arguments in different roles? There are, broadly speaking, three different ways:

three ways to code for arguments in different roles

We’ll take a brief look today at these three different strategies, all of which a localizeable natural language interface will surely encounter.

(more…)

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient [[universal translator]] of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character ([[Chinese characters|漢字/汉字]]) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s [[Chinese character|Wikipedia]] or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Linguistics in 嘉義

Tuesday, May 13th, 2008

A couple weeks ago I went to Chiayi (嘉義, pinyin: Jiāyì) to present a paper at the Linguistic Society of Taiwan’s National Conference on Linguistics.[^1] I got a chance to meet some wonderful and kind Taiwanese linguists, make friends with some linguistics students, as well as explore the city of Chiayi.

(more…)

I’m Busy to Die

Tuesday, October 30th, 2007

Today at work: the military guy who has quite good English told me that he was very busy as our school is being observed next week by administrators. He then told me, “I’m busy to die.”

While I originally thought he might have mispronounced “today,” he obviously knows that word… I believe he was trying to say “,” a Mandarin resultative construction which could be translated “I’m busy to the extent that I will die.” Obviously this is not literal… V+ compounds are a common form of exaggeration. It was a neat instance of grammatical transfer, though.