blog

Posts Tagged ‘linguistics’

Ubiquity in Portuguese

Thursday, March 5th, 2009

Felipe, a Ubiquity user, put together a wonderful look at what Ubiquity might look like in Portuguese. He has some great points here particularly regarding the “map” verb used in English—Felipe points out that Portuguese does not have a very common “map” verb and that it would be much more common to use enter me dê (literally me give) to use a verb to request a map. This is a great example of how Jono’s overlord verbs proposal may be an important aspect of our i18n efforts. The post is also timely as we’ve recently been discussing in our regular meetings (open to all!) that Portuguese may/could be the focus of our next parser construction efforts.

What would the challenges be for Ubiquity in your language? We’d love to see an increasing number of blog posts on this topic in different languages. Thanks Felipe! ^^

Unnatural by design

Sunday, March 1st, 2009

I’m flying over the pacific ocean right now but a little bit of language caught my eye. Here’s a picture of the menu for this flight, in three languages: English, Japanese, Chinese.

menu.jpg

What caught my eye is the line “served with ご一緒に 配,” meant to be read as part of “Beef in BBQ sauce… served with Pepsi…”. The Chinese 配 (pèi) is fine here, meaning “with,” but the Japanese “ご一緒に” (goissho-ni) seemed awkward to me.

(more…)

Localizing Ubiquity: an open letter to linguists

Thursday, February 26th, 2009


Localizing Ubiquity: an open letter to linguists from mitcho on Vimeo.

Below is a transcript of this video. Please distribute this video far and wide to anyone who may be interested. ^^

(more…)

Ubiquity in Firefox: Focus on Japanese

Friday, February 20th, 2009

One of the eventual goals of the Ubiquity project is to bring some of its functionality and ideas to Firefox proper. To this end, Aza has been exploring some possible options for what that would look like (round 1, round 2). All of his mockups, however, use English examples. I’m going to start exploring what Ubiquity in Firefox might look like in different kinds of languages. Let’s kick this off with my mother tongue, Japanese.1

今後多様な言語に対応したFirefox内のUbiquityを検討していきますが、その中でも今日は日本語をとりあげます。後日日本語で同じ内容を投稿するつもりです。^^ 日本語でのコメントも大歓迎です!

(more…)

Contribute: how your language identifies its arguments

Wednesday, February 18th, 2009

Earlier today I blogged on three different strategies languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs.

I gathered some data from the fantastic World Atlas of Language Structures to put together a survey of many of the languages on the Internet. For each of the languages, I got the canonical word order and whether the language marks the role of its argument on the verb and/or the arguments themselves.

As you can see, there are a number of data points that are still missing. Please contribute information on the languages you speak! You can edit the spreadsheet on Google Docs. Thanks!

Three ways to argue over arguments

Wednesday, February 18th, 2009

UPDATE: Contribute information on how your language identifies its arguments here.

When we execute a command in Ubiquity, in very simple terms, we’re hoping to do something (a verb) to some arguments (the nouns). Every sentence in every language uses some method to encode which arguments correspond to which roles of the verb. Here are a couple examples:

1
2
He sees Mary.
彼が Maryを 見る。 (Kare-ga Mary-o miru.)

As speakers of English, you can read sentence (1) above and know exactly who is doing the seeing and who is being seen and speakers of Japanese can get the same information from (2). How do different languages code for arguments in different roles? There are, broadly speaking, three different ways:

three ways to code for arguments in different roles

We’ll take a brief look today at these three different strategies, all of which a localizeable natural language interface will surely encounter.

(more…)

Gaba, Shame On You

Monday, January 12th, 2009

A Gaba ad on a train

Here’s a picture of an ad for Gaba, a big English conversation school in Japan, I snapped on a train recently. I felt the English sentence about Gaba’s satisfaction was extremely awkward, so I put it up on twitter to check with some other native speakers. My friends concurred. What do you think?

I personally think the sentence would be improved by removing the “the” in “the satisfaction.” Others offered “continues to rise” as possibly preferable to “continually rise.” English articles, especially the definiteness of abstract nouns, is very difficult for many non-native speakers. That being said, it’s sad for a sentence of such questionable acceptability to come from a company which, in theory, prides itself in its English ability and surely hires many native speakers. Gaba, shame on you.

回収 vs. 収集 and Better Word Meanings Through Usage

Thursday, September 18th, 2008

Bailey just asked me what the difference between 回収 (kaishū) and 収集(shūshū) is—two words that would both map to the English verb “collect.” I intuitively came up with a hypothesis to explain the distinction:

  • 回収 may take things away from others when collecting while 収集 does not have that implication.
  • Things that you 回収 may have been previously distributed by the actor themself while 収集 does not have that implication.1

Not content with armchair theorizing, however, I decided to take advantage of one of the largest corpora in the world: Google.2 To test my hypothesis, I chose two “objects of collection”, one you can take away (and often is distributed first) and one you can’t take away: アンケート (ankēto “survey,” from the French enquête) and 意見 (iken “opinion”). I then took the four resulting collocations3 on Google in quotes (“•”) and recorded how many hits there were.

(more…)


  1. This second point could also be hypothesized based on the component meaning of 回, which in the verb 回る (mawa=ru) can mean “circle back.” 

  2. Google is of course a huge corpus but it has very limited search and can easily be misused and misunderstood, thus making Google an unreliable (unprofessional) source for statistical data. One Google alternative for some different statistics is the n-gram data they offer for research. 

  3. ”Collocation” on Wikipedia says: “Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance.” 

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient universal translator of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character (漢字/汉字) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s Wikipedia or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Linguistics in 嘉義

Tuesday, May 13th, 2008

A couple weeks ago I went to Chiayi (嘉義, pinyin: Jiāyì) to present a paper at the Linguistic Society of Taiwan’s National Conference on Linguistics.[^1] I got a chance to meet some wonderful and kind Taiwanese linguists, make friends with some linguistics students, as well as explore the city of Chiayi.

(more…)

White Protestants and Catholics don’t frequently attend religious services

Wednesday, February 13th, 2008

Breaking news from the Potomac Primaries:

White Protestants and Catholics backed Mrs. Clinton, but Mr. Obama was strongly supported by voters who frequently attend religious services.

Seeing as backing Mrs. Clinton and supporting Mr. Obama are, in terms of votes, mutually exclusive, this sentence entails that white Protestants and Catholics (the majority of ) are not a part of “voters who frequently attend religious services”, as is demonstrated by the infelicity of the following sentence:

“Group A did A, and Group B did not do A — but Group A is part of Group B.”

Well, that just settles it then.

Eats, shoots, and leaves

Monday, December 17th, 2007

I just read Clause and Effect (via DF), a great editorial discussing commas in the second amendment and their effects on interpretation of the law. I found this timely as Bailey and I just watched Institutional Memory, the penultimate episode of The West Wing, where Toby Ziegler discusses a comma in the fifth amendment’s takings clause: “nor shall private property be taken for public use[,] without just compensation.” BBC’s H2G2 has a pretty good write-up and there’s a listing of relevant links as well.

The funny thing about all of these is that we don’t speak commas. It’s used to graphically represent pauses in speech, but are often used according to certain artificial rules which, when used systematically, aim to help the reader parse the sentence or help disambiguate between different readings.1

I’m surprised Language Log hasn’t picked up this new piece yet. UPDATE: Yup, they got to it. Great coverage, as always.


  1. We use pauses in spoken language to do this too, but not necessarily in the same places that we place commas in “good” written language. 

ETA-ROC and Another Weekend in Taipei

Monday, November 12th, 2007

I spent this past weekend in Taiwan, attending the English Teaching Association of the Republic of China (ETA-ROC) conference. While the original intention was for a number of us ETA’s to go, it ended up that I went alone. I saw a number of talks Saturday… I went to a number of the more theoretical or quantitative talks and had a great time. I saw Krashen talk again, this time on the Comprehension Hypothesis. I have to say, he’s a fabulous speaker, and the case studies he looked at for this talk were fascinating: a Mexican immigrant who worked in a deli and learned Hebrew before he knew it, a culture where the rule is that you can’t marry someone who speaks the same language as you, etc. ^^ I also saw Andrew Cohen from Minnesota which made me miss Minnesota a bit.

The conference was held at the Chien Tan Youth Activity Center which has a beautiful pond and great view of the Grand Hotel, on the site of an old Shinto shrine.

IMG_9767IMG_9768 IMG_9770

As I recently did a little editing for a journal on English teaching here, I was invited to the presenters’ dinner Saturday night. While it was slightly awkward at first, not being a presenter myself, I soon met two representatives from the Korea and Philippines TESOL organizations who were very kind to me and we had some great conversations and laughs. (They are the two on the right in the first photo. The second photo is with the Filipino representative, Bernard Spolsky and me.)

PB100231PB100233

I stayed overnight Saturday at the Eight Elephants hostel. Less than a year old, Eight Elephants is stylish, clean, and comfortable, though not the cheapest hostel in town. My experience there was great… I made a friend, a student of Special Education from Kaohsiung, and we went out to the nearby Shida night market. After randomly running into Kate who was in Taipei with her host family, she took me to a cafe she knew and we had a great time talking. While her English is great as well, we were talking completely in Chinese. After spending the day thinking about comprehensible input, it was great listening to her, understanding about 80%, and chiming in once in a while. As her interests were teaching and learning languages (including Japanese), we hit it off well with some great conversation. I look forward to seeing her again when I visit Kaohsiung in the near future.

On Sunday morning I saw another talk by Andrew Cohen, had lunch, and met up with a couple of the interns at the Fulbright Taiwan foundation who showed me around Taipei. We went to the Chiang Kai-shek Memorial Hall and randomly ran into Dr. Wu Jing-jyi, the director of the Foundation, on the plaza. We then went to check out the Taipei Modern Art Museum (with the first .museum address I’ve ever actually seen), which was super cheap and very enjoyable, albeit being relatively small. (The last photo below is at the Taipei Story House, which is a historic building—we just took a picture outside without going in.)

IMG_9776IMG_9777IMG_9778 IMG_9779

We had some Hong Kong-style 燒臘 preserved meat for dinner. I came back to Nanao Sunday night feeling fulfilled and blessed by the people I’d met all weekend, at the conference, at the hostel, and around the city.

The Nerd Handbook

Monday, November 12th, 2007

From Rands in Repose’s Nerd Handbook, probably a good guide for Bailey (though I don’t quite fit the target completely):

But in nerds’ bit-based work, progress is measured mentally and invisibly in code, algorithms, efficiency, and small mental victories that don’t exist in a world of atoms.

I feel this phenomenon exists in formal linguistics as well, where the elegance of an analysis may be measured in theory-internal terms. It’s hard to get other people excited when they don’t share that same background, precisely as there is no physical manifestation of an analysis. At least Bailey’s good about listening, trying to understand, and being happy for me. ^^

(via Daring Fireball)

Krashen The Party

Friday, November 9th, 2007

Yesterday we ETA’s went to a workshop at Lan-Yang Institute of Technology. The workshops were focused around the instruction of reading. The three afternoon sessions we saw included two workshops on building vocabulary and one by Stephen Krashen.

Krashen is kind of like the Chomsky of language acquisition and teaching—a huge and controversial (some may say incendiary) figure who you can love or hate, but can’t ignore. Last Wednesday in our weekly workshop, Dr. Collins delivered a chronological run down of Krashen’s theories.1 As an entertaining aside, one task given to us was to draw a schematic diagram of Krashen’s view of language acquisition and production. Below is Dale’s drawing, which eerily reflects the geography of the brain… the input comes in through the ears (or eyes, at the back of the brain), then hits the Affective Filter (the amygdala), goes to the Language Acquisition Device (the Broca’s and Wernicke’s areas), then the output is filtered by the Monitor—a product of conscious learning—(the frontal lobe). Pretty creepy.

dales-brain

Krashen’s talk2 was fascinating, albeit not what I expected: given that the workshop’s focus was on the teaching of reading and that he himself has been a big advocate of recreational reading for language learners, I expected more on teaching English reading as to non-native speakers. The majority of the talk, though, was on writing and the composing process: “reading more makes you a better writer, but writing more makes you smart.” He talked about how the act of (regular) writing clarifies and organizes our thoughts, and advocated for a writing process which involved much revision as, “every time you have to revise, it means you’ve become smarter,” and building relaxation (to allow for eureka moments) into the process. His conclusion and analysis are important for first-language speakers just as much as the second-language learner, and the talk did feel more like a writing seminar than a pedagogical one. Krashen is an engaging and entertaining speaker, using many examples from famous writers and common experience to draw his conclusion.

The intensity with which he spoke and the passion for thinking about thinking reminded me of Sally’s Honors Analysis class, which was as much about thinking as it was about mathematics. Sally once told us that, when we’re stuck on a problem, we should find someone just about as smart as us and just explain the problem to them. He claimed that the majority of the time, the simple process of explaining the problem outloud and answering clarifying questions would make the solution come to us. It’s a powerful technique that I’ve used many times at Chicago and elsewhere, and Krashen’s analysis of what happens when we write thus struck a chord with me.

Afterwards I was fortunate enough to go out to dinner with the speakers, some of our advisors, and some faculty from the Institute that hosted the workshop. I had some great conversations about my background, where my future directions may lie academically, and of course the ideas. ^^ It reminded me of dinners with linguists back at home, after a workshop or CLS. I realized I miss the fraternity of academia—the sense of mutual respect and interest academics have for each other’s work and ideas, even if the “other” is only 22 years old.


  1. A similar basic run down of Krashen’s various theories is found on this blog post, The Krashen Revolution

  2. Krashen, Stephen. “What is Academic Language Proficiency,” presented at the International Conference and workshops on English Language Teaching: Pedagogical Aspects of Reading. Yilan county, Taiwan, November 8th, 2007. 


© 2006–2011 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress on Media Temple.
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.