mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.

blog

Display tweets (beta) or just follow me on twitter

Stanley Kubrick on linguistic fieldwork

July 29th, 2013

I have found that when you finally come down to the day the scene is going to be shot of the elicitation and you arrive on the location with the actors speakers, having had the experience of already seeing some scenes shot data, somehow it’s always different. You find out that you have not really explored the scene language to its fullest extent. You may have been thinking about it incorrectly, or you may simply not have discovered one of the variations which now in context with everything else that you have shot elicited is simply better than anything you had previously thought of. The reality of the final moment, just before shooting, is so powerful that all previous analysis must yield before the impressions you receive under these circumstances, and unless you use this feedback to your positive advantage, unless you adjust to it, adapt to it and accept the sometimes terrifying weaknesses it can expose, you can never realize the most out of your film fieldwork.

Stanley Kubrick

The new Apple campus and the Pentagon compared

June 8th, 2011

+ + =

That is all.

Checking mochitest test coverage

March 22nd, 2011

Firefox Download ButtonOne of the last bugs for Firefox Panorama was bug 625818: “Check Panorama mochitest test suite coverage”. Our automated tests ensure that we do not regress on existing functionality, but it’s only as good as its coverage: how much of the Panorama code base is actually being “hit” through the process of running the test suite.

Panorama went through a pretty rapid development cycle, making it into Firefox 4 which was released today (yay!). Moreover, for a while we were developing outside of mozilla-central, without the regular “patches require tests” requirement. This makes checking its test coverage particularly important.

Check out the final result, the Panorama test coverage report. The good news: our code coverage is 86%! (Some notes on what improvements can be made are in the bug.)

code coverage report

PhiliKON had previously worked on hooking into the JS Debugger service’s interruptHook to test xpcshell tests. I modified this code to run instead in the Mochitest browser chrome tests. This code can be found on the bug.

With this patch applied, I invoked the test suite with the following code: TEST_PATH=browser/base/content/tests/tabview COVERAGE_FILTER="*tabview*" COVERAGE=true make -C obj-ff-dbg mochitest-browser-chrome . That’s a regular mochitest-browser-chrome invocation with the COVERAGE=true flag which turns on code coverage checking, and COVERAGE_FILTER=*tabview* which filters out results from files which don’t have “tabview” in their paths. This creates a file called coverage.json in the working directory of the test suite, meaning, for me, obj-ff-dbg/_tests/testing/mochitest/.

This JSON file is a multidimensional array, with file paths and then line numbers as keys. The file paths here, as best as possible, have been converted into local filesystem paths. PhiliKON built a script which produces beautiful reports based on this output.

A word of warning: running with this JSD interruptHook is ridiculously slow. A number of tests for Panorama are timing-dependent (drag-drop tests, for example), making some of them fail, but that’s okay… as long as it completed not via a timeout, it actually did run through all the code. In order to get this to run through everything with some degree of control, I split up the mochitest tabview suite in to a few chunks. I then took the multiple resulting coverage.json files and passed them into another script, in tools/coverage/aggregate.py, which takes multiple JSON results like this and puts them together into a single JSON file. I then passed this aggregate JSON file to PhiliKON’s wonderful report script and—voila—the Panorama test coverage report! Easy as pie.

Happy Halloween!

October 29th, 2010

Happy Halloween from the Firefox Panorama team!

We carved some pumpkins a couple days ago in my department. I carved the Panorama logo above, but also one of the Stata Center.

More Jack-O-Lantern photos, including great ones of Chomsky and Norvin Richards, are up on Flickr.

Voicemail from Jesse

July 3rd, 2010

My friend Jesse left me a voicemail on my Google voice number. Here’s a demo of the fantastic transcription feature.

Voicemail from Jesse from mitcho on Vimeo.

Every website has a purpose

June 2nd, 2010

Every website has a purpose. Maybe you want people to buy a product, donate to your cause, download your app, or subscribe to your mailing list. How can you confidently modify your site to make it more effective with respect to this goal?

A/B testing is a process by which multiple variants of a website are presented to different users randomly and statistical tools are used to see whether any variant is more effective, according to an overall goal metric such as conversions or revenue.

While various A/B testing products—many free—exist, none are made from the ground up to work within the WordPress ecosystem. I believe a solution made particularly for WordPress could make A/B testing so much easier and more straightforward, and that such a solution could be greatly beneficial to the platform as a whole.

I’m happy to announce my new project, code-named ShrimpTest,1 which is directly aimed at filling this void. I’ll be working on this project this summer together with the fantastic folks at Automattic.

The best way to keep up with development is on the project’s development blog, the ShrimpTest P2. Most updates will most likely be much shorter than this initial post. ;) You can get less frequent, milestone-like updates by following ShrimpTest on twitter. Development will be open so feel free to check it out (haha) and submit patches as well. As I go along, I’ll also look forward to your feedback.


  1. Five dollars to the first person to correctly guess why I’m calling it ShrimpTest. 

Better Linguist List RSS Feeds

April 26th, 2010

Everyone I know in linguistics uses the LINGUIST List website to a greater or lesser degree. Linguist List began as a mailing list in the 90’s, with book, job, and dissertation announcements, call-for-papers, and general academic discussions.

Nowadays many people follow the various announcements on Linguist List using an RSS feed reader such as Google Reader or my personal favorite NetNewsWire.

Unfortunately, the Linguist List RSS feeds (at least recently) don’t include the full text of the articles and have a few other quirks as well. It’s often hard to judge based on the title whether it’s really something I’m interested or not, so I’ve spent a lot of time frustratedly opening any possibly interesting-looking entry in a separate NetNewsWire tab. Today I decided enough was enough: I just wrote a script which parses each of the Linguist List RSS feeds, finds the actual descriptions and interleaves them.1 It’s working remarkably well so far:

Read the rest of this entry »


  1. Veteran Linguist List RSS subscribers will also note that I’m adding the full title to the entry title for the Conferences and Calls lists as well. 

Beginning development with Jetpack SDK 0.2

April 14th, 2010

This article is a translation of a recent article in Japanese by fellow Jetpack Ambassador Gomita which was published on the Mozilla Labs Jetpack blog. I’m cross-posting it here for posterity.

Mozilla Labs recently released version 0.2 of the Jetpack SDK, which fixes some issues of the 0.1 release such as a glitch regarding development with Windows. SDK 0.2 doesn’t include the planned APIs for rapid development of new browser functionality, but you can still play with SDK 0.2 to get a flavor for development with the Jetpack SDK.

In this article we begin by setting up an SDK 0.2 development environment and explain the steps required to develop a simple, practical add-on using SDK 0.2. The instructions here are for Windows, but the basic steps are the same in every platform.

Read the rest of this entry »

Spring is for Speaking: JSConf, WordCamp SF, IACL

March 20th, 2010

I recently confirmed three different very exciting speaking gigs which I’ll be doing this spring:

Read the rest of this entry »

Jetpacking in Boston

March 13th, 2010

A couple weeks ago I gave a talk at the Boston Javascript meetup introducing Jetpack and filling people in in the latest developments in the project, including the Reboot. Between 20 to 30 people came to the talk which was at Microsoft Cambridge. Here are the slides from the talk:1

Extend the Browser with Jetpack

Read the rest of this entry »


  1. If anyone would like the Keynote deck, just let me know. 

Jetpack Ambassadors in MV

February 21st, 2010

A couple weeks ago I went out to Mozilla HQ in Mountain View for a Jetpack Ambassador meetup. Jetpack is a project at Mozilla labs intended to make writing Firefox add-ons easier, and shares some ancestry with the Ubiquity project dear to my heart. The Jetpack Ambassadors are a team of Mozilla community members who will be involved with Jetpack marketing, evangelizing Jetpack and writing about our own experiences working with the exciting new Jetpack architecture.

We spent a good chunk of time with a team from Invisible Elephant who came in to give us some training on making technical presentations, and then dug into the code on Day 2. It was great to have the geniuses at Mozilla Labs like Atul and Myk show us the latest Jetpack code as well as get the latest project direction from Daniel, Aza, and Nick, from which we could see the amount of careful consideration and effort that’s gone into the Jetpack reboot.

The best part of the whole experience, though, has to be the fellowship with the other Jetpack Ambassadors. The Ambassadors came from all over the world, encompassing Europe, Asia, S. America, and of course N. America. Each are involved with some really exciting projects and have each made a name for themselves in their respective communities. I’ve put together a twitter list of all the Jetpack Ambassadors and the core team members and invite you to follow them.

We also had the greatest number of Ubiquity core developers to have ever been in the same place at the same time, which of course had to be documented. :)

(More photos can be seen in my gallery.)

I had a fantastic time in MV and it was a shame I could only be there for such a short time. I feel honored to be a part of this group and am looking forward to speaking on Jetpack soon at an event near you!

After the Deadline for Firefox

February 1st, 2010

After the Deadline is a powerful and intelligent proofreading tool which checks for spelling errors, misused words, some grammatical gaffes, and even some stylistic issues. For the past month, I’ve been working for Automattic, the company behind AtD and the makers of WordPress.com, to create a Firefox add-on which enables this superior technology everywhere on the web. Words can’t do justice to the magic that is AtD, so here’s a video we put together:

I invite you all to give it a spin:

add-add-on.png

Working on After the Deadline for Firefox gave me my first experience creating an add-on from the ground up and I’ve learned a lot. After working on Ubiquity and dabbling with Jetpack, it’s given me another perspective on extensibility on the web and I look forward to thinking and writing more about these experiences in the near future.

In the mean time, happy proofreading! :D

WordCamp Boston 2010

January 29th, 2010

4096077627_c6d3035124_o.jpg

This past weekend I gave a couple talks at the inaugural WordCamp Boston. WordCamps are local, community-organized events for WordPress users and enthusiasts. We had about 400 people at the Microsoft Cambridge campus.

Read the rest of this entry »

Creating an image-sized iframe overlay with Shadowbox

January 13th, 2010

I recently have been working with the Shadowbox JavaScript library for an upcoming revision to the MIT Edgerton Digital Collections website. Shadowbox is a nice [[Lightbox (JavaScript)|lightbox]] library designed to work with various JavaScript libraries like jQuery, prototype, and mootools with a nice modular design.

Shadowbox is organized around different “players”—one for each kind of media that will be displayed. The library by default comes with players for Flash, HTML fragments, iframes, QuickTime, and Windows Media. Some of these players, like those for images and video, automatically recognize the media size and adjust the lightbox accordingly, while others such as the iframe player can use a set size or can fill the screen. For the Edgerton site, though, we had a need for displaying an iframe but in the dimensions of a set image, so that we could display the image with an overlay. Here are some notes on how to implement a custom player for Shadowbox.

Read the rest of this entry »

Disgusting Word-formatted HTML and how to fix it

December 30th, 2009

In working on a new website for the MIT Working Papers in Linguistics, I recently inherited a collection of HTML files with all of our books’ abstracts. To my dismay (but not surprise) the markup in these files were horrendous. Here are some of the cardinal sins of markup that I saw committed in these files:

  1. Confusing ids and classes. ids should be unique on the page… but here’s an instance of using multiple instances of the same id in order to format them together.
4.2.1
161
Old French (Adams 1987)

<

div id="indent">

4.2.2
164
The evolution of the dialects of northern Italy

2. Putting a class on every instance of something. Everything paragraph should be formatted equivalently. We get the point.

The English Noun Phrase in Its Sentential Aspect

Steven Paul Abney

May 1987

3. Using blank space for formatting.

&nbsp;

4. CSS styles that don’t exist. Browsers just ignore these anyway…

<

p class=MsoNormal>One factor in determining which worlds a modal quantifies over is the temporal argument of the modal’s accessibility relation. It is well-known that a higher tense affects the accessibility relation of modals. What is not well-known is that there are aspectual operators high enough to affect the accessibility relation of modals.

The solution

My solution was to write a perl script which takes care of a number of these issues. It’s not foolproof and doesn’t involve any voodoo—for example, it can’t retypeset things which were formatted using whitespace—but it does a good job as a first pass.

You can run the script by making it executable (chmod +x cleanwordhtml.pl) then specifying a target filename as an argument. For example,

./cleanwordhtml.pl source.html > clean.html

I used this with a simple bash for loop to run over all my files:

for f in /.html; do ./cleanwordhtml.pl $f > ${f%.html}-clean.html; done;

Hopefully someone else can benefit from my experience.