blog

Posts Tagged ‘Mandarin’

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient universal translator of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character (漢字/汉字) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s Wikipedia or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Linguistics in 嘉義

Tuesday, May 13th, 2008

A couple weeks ago I went to Chiayi (嘉義, pinyin: Jiāyì) to present a paper at the Linguistic Society of Taiwan’s National Conference on Linguistics.[^1] I got a chance to meet some wonderful and kind Taiwanese linguists, make friends with some linguistics students, as well as explore the city of Chiayi.

(more…)

I’m busy to die

Tuesday, October 30th, 2007

Today at work: the military guy who has quite good English told me that he was very busy as our school is being observed next week by administrators. He then told me, “I’m busy to die.”

While I originally thought he might have mispronounced “today,” he obviously knows that word… I believe he was trying to say “,” a Mandarin resultative construction which could be translated “I’m busy to the extent that I will die.” Obviously this is not literal… V+ compounds are a common form of exaggeration. It was a neat instance of grammatical transfer, though.