mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.

blog

Posts Tagged ‘Chinese’

Exploring Command Chaining in Ubiquity: Part 2

Sunday, August 23rd, 2009

Introduction

I recently have begun giving serious thought to what command chaining might look like in Ubiquity and the various considerations which must be made to make it happen. The “command chaining,” or “piping,” described here always involves (at least) two verbs acting sequentially on a passed target—that is, the first command performs some action or lookup and the second command acts on the first command’s output.

A few days ago I penned some initial technical considerations regarding command chaining. In this post I’ll be point out some linguistic considerations involved in supporting a natural syntax for chaining.

(more…)

Ubiquity interview in Chinese

Monday, June 1st, 2009


Mitcho on Ubiquity i18n / l10n on YouTube

This past weekend was Mozilla Party JP 10 here in Japan and one of the speakers was Bob Chao (趙柏強) of Creative Commons Taiwan and MozTW. We got to talking in Chinese and he got a video interview of me talking about Ubiquity and our upcoming Parser 2 and the challenges of localization. I’ve never talked about my Mozilla work in Chinese before so it was definitely a challenge and I stumbled a lot, but hopefully some of the ideas got through. :)

前天我參加了Mozilla Party 10,一個日本 Mozilla 社群的會議。我在 Mozilla Party 才認識台灣 Mozilla 社群的趙柏強,我們就開始講國語。因為我自己很想念用中文,所以我非常高興有這個機會跟他談話。以後他拍一端 video,我給台灣的 Mozilla fans 把 Ubiquity 介紹一下。我的中文真的亂七八糟,大家對不起喔。 ^^;

Three ways to argue over arguments

Wednesday, February 18th, 2009

UPDATE: Contribute information on how your language identifies its arguments here.

When we execute a command in Ubiquity, in very simple terms, we’re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:

  1. He sees Mary.
  2. 彼が Maryを 見る。 (Kare-ga Mary-o miru.)

As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). How do different languages code for arguments in different roles? There are, broadly speaking, three different ways:

three ways to code for arguments in different roles

We’ll take a brief look today at these three different strategies, all of which a localizeable natural language interface will surely encounter.

(more…)

The 北京话儿 Beijing Pirate T-shirt

Saturday, June 14th, 2008

Speaking of t-shirts, I’d been toying with a t-shirt idea for the past year or two: a Beijing Pirate t-shirt. Let me explain…

A distinctive feature of Beijing dialect of Mandarin (and, indeed, most northern Chinese dialects) is the very frequent [[r-coloring|rhoticization]] (adding to or replacing the end of a word with “arr”) whose function is often glossed as a diminutive suffix. This phenomenon is called [[Erhua|儿化 (érhùa)]] in Chinese. Here are some examples, blatantly stolen from Wikipedia:

  • 公园(gōngyuán)(public garden) → 公园儿(gōngyuánr), pronounced “gōngyuár”
  • 小孩(xiǎohái) (small child) → 小孩儿(xiǎoháir), pronounced “xǐaohár”
  • 事 (shì) (thing) → 事儿(shìr), pronounced “shèr”

The result of this variation is that it makes you sound like a pirate… and thus my t-shirt idea was born:

Beijing Pirate shirt

(more…)

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient [[universal translator]] of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character ([[Chinese characters|漢字/汉字]]) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s [[Chinese character|Wikipedia]] or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Taipei find: a dictionary of Chinese-Japanese false cognates

Saturday, March 22nd, 2008

The fact that Japanese and Chinese both share the use of Chinese characters. The connection goes beyond simply sharing characters, though: many two- and four-character expressions in Japanese come from older Chinese (these are known as Sino-Japanese items in the [[linguistics|biz]]). This is how I can often “cheat” and use my knowledge of Japanese to guess what some Chinese words are saying, even if I have no idea how to pronounce them.

There are, however, many Chinese-Japanese false cognates—words which look the same and often do indeed have a shared etymology, but have quite different contemporary meanings.1 As such, I’ve often lamented to friends, especially learners of Japanese or Chinese, the lack of a dictionary highlighting these false cognates and how their usage differs between the Japanese and Chinese. A couple weekends ago I was browsing dictionaries in the Page One bookstore in [[Taipei 101]] and I found exactly that: 誤用度100%日語漢子.

Each spread shows the three sets of cognates, with an explanation of the Japanese use, in Chinese, on the left, and vice versa on the right. It’s a godsend.

By the way, here’s my favorite Chinese-Japanese false cognate:

勉強 (べんきょう)

one’s study (N), to study (V) ~する

勉強 (miǎnqiǎng)

  • V
    1. force sb. to do sth. | ¹Bié ∼ tā. Don’t force him to do it.
    2. do with difficulty
  • [[static verb|S.V.]]
    1. unconvincing; strained | Zhège jìhuà ¹hěn ∼. This plan may not work.
  • Adv
    1. reluctantly; grudgingly | Tā ∼ xiàole yīxià. He forced a smile.
    2. barely enough | Tā ∼ néng shuō jǐ jù Fǎyǔ. She can speak only a little French.

  1. In French, they’re “faux amis,” but I think that sounds more like a spy.