blog

Posts Tagged ‘language’

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient universal translator of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character (漢字/汉字) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s Wikipedia or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Sign language SuperBowl ad

Monday, February 4th, 2008

I don’t care much for the game, but always love checking out the SuperBowl ads every year… this year there was something really cool… a sign language ad by a deaf group at PepsiCo.1 Very cool.

The crew has their own website at Pepsi too: Bob’s House.


  1. “ad”, used loosely… does this ad sell anything? 

Patricks Nortons on Tekzillaz

Wednesday, January 9th, 2008

I just noticed something on the latest Tekzilla Daily: Patrick Norton, host of Tekzilla and former host of the Screen Savers says “there’s a lots to learn here” (1:28) and then later “the site you’re having troubles with” (1:39). While “having troubles with…” is fine, I believe “having trouble with…” is much more common. As for “a lots to learn,” however, that’s definitely out. Is it hyperarticulation? I don’t know.

Wikipedia notes: “Norton grew up in the Midwest, but considers the Jersey Shore his home… He currently lives in San Francisco, California.” So, is this a Jersey Shore or California thing? I have no idea.

Setting Language Research to Music

Monday, December 24th, 2007

Via LinguistList:

‘Setting Language Research to Music’ is a Newcastle University project whose aim is to compose orchestra and choral music to demonstrate infant perception and production. The first piece of music to emerge from the project, ‘Swing Cycle’, mimics babies’ experience of discovering word boundaries, taking work by Peter Jusczyk and colleagues as a starting point.

It’s the craziest thing I’ve seen in a long while… it reminds me of the Music: Materials and Design course I took a couple years ago. My final project was an electronic composition building a rhythm with political speech samples and echos and cracking noises, representing the hollowness of political rhetoric. It was one of my academic low points at Chicago, for sure.

Maybe it’s because I’m an artist, but I’ve never understood the drive for modern art, including compositions like these. I would much rather listen to some music and read about language acquisition separately… the motivation to combine the two eludes me.

You can listen to The Swing Cycle and read the lyrics (or their approximation) on the Setting Language Research to Music website.

Eats, shoots, and leaves

Monday, December 17th, 2007

I just read Clause and Effect (via DF), a great editorial discussing commas in the second amendment and their effects on interpretation of the law. I found this timely as Bailey and I just watched Institutional Memory, the penultimate episode of the West Wing, where Toby Ziegler discusses a comma in the fifth amendment’s takings clause: “nor shall private property be taken for public use[,] without just compensation.” BBC’s H2G2 has a pretty good write-up and there’s a listing of relevant links as well.

The funny thing about all of these is that we don’t speak commas. It’s used to graphically represent pauses in speech, but are often used according to certain artificial rules which, when used systematically, aim to help the reader parse the sentence or help disambiguate between different readings.1

I’m surprised Language Log hasn’t picked up this new piece yet. UPDATE: Yup, they got to it. Great coverage, as always.


  1. We use pauses in spoken language to do this too, but not necessarily in the same places that we place commas in “good” written language. 


© 2006-2008 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress.
Entries (RSS) and Comments (RSS).
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.