blog

Display tweets (beta) or just follow me on twitter

The new Apple campus and the Pentagon compared

June 8th, 2011

+ + =

That is all.

The next generation of related posts: announcing the YARPP-BlogGlue partnership

June 7th, 2011

YARPP hearts BlogGlue

On January 2nd, 2008, from the rural town in eastern Taiwan where I was then working, I committed version 1.0 of the Yet Another Related Posts Plugin (YARPP) to the WordPress plugins repository. Since then, YARPP has been downloaded almost one million times and has evolved greatly. It also has been the impetus for my further involvement in the WordPress open source community.

I believe the success of YARPP comes from its simple premise: display related content to visitors and they will stay on your site longer. Your visitors learn more about you and they in turn find more compelling content that they’re interested in.

One oft requested feature for YARPP has been the ability to display related posts from across multiple sites. I’ve thought about what this would involve technically—the infrastructure, the algorithms, the queries—and have chosen not to go down this route. To do so would require a serious investment of time and resources, which doesn’t make sense for me as I continue to be in grad school.

In January, though, I met the team at BlogGlue while in Phoenix for WordCamp Phoenix. It was refreshing to meet a team who has really thought through the technical aspects of calculating “relatedness” of content, as well as the potential social and economic impact that this type of related content suggestion can have. They’ve been working in this space for a couple years and have a patented, scalable algorithm for computing “relatedness” of content from across thousands of different sites. When you sign up with BlogGlue, it displays “related links” on your blog post, just like YARPP does, but from your own site as well as from other content sources that you recommend. In turn, your blog posts may show up on your partners’ sites as well.

What surprised me during that visit was some of the statistics they had gathered from their users: not only does BlogGlue bring in traffic to your site, visitors who come in via the BlogGlue network stay on your site 3–4 times longer than visitors from Google and view 2–4 times as many pages. This makes complete sense to me based on my experience with YARPP: people want to see content that is relevant to them. A search query is one way to find content you’re interested in, but it’s imperfect. With related posts across a network, the system can find related content based on all the content on the current page, which is a much richer “cue” than a simple search query would ever be.

Today I’m pleased to announce that YARPP has officially partnered with BlogGlue. Think of BlogGlue as “YARPP Pro”: it’s the multi-site, social, next generation of YARPP. YARPP helps your visitors discover more of your own content once they’re on your site—BlogGlue additionally helps more people discover your content in the first place, in a rich, meaningful way.

YARPP will of course continue to exist and be supported by me. BlogGlue and I will be advising each other on technical aspects of our offerings, pushing both products forward, as well as cross-promoting our offerings. The latest version of YARPP, 3.3, already has some UI simplifications which came out of a conversation with the BlogGlue team. I’m looking forward to working with them more in the future and I urge you to try out BlogGlue on your own site as well.

Checking mochitest test coverage

March 22nd, 2011

Firefox Download ButtonOne of the last bugs for Firefox Panorama was bug 625818: “Check Panorama mochitest test suite coverage”. Our automated tests ensure that we do not regress on existing functionality, but it’s only as good as its coverage: how much of the Panorama code base is actually being “hit” through the process of running the test suite.

Panorama went through a pretty rapid development cycle, making it into Firefox 4 which was released today (yay!). Moreover, for a while we were developing outside of mozilla-central, without the regular “patches require tests” requirement. This makes checking its test coverage particularly important.

Check out the final result, the Panorama test coverage report. The good news: our code coverage is 86%! (Some notes on what improvements can be made are in the bug.)

code coverage report

PhiliKON had previously worked on hooking into the JS Debugger service’s interruptHook to test xpcshell tests. I modified this code to run instead in the Mochitest browser chrome tests. This code can be found on the bug.

With this patch applied, I invoked the test suite with the following code: TEST_PATH=browser/base/content/tests/tabview COVERAGE_FILTER="*tabview*" COVERAGE=true make -C obj-ff-dbg mochitest-browser-chrome . That’s a regular mochitest-browser-chrome invocation with the COVERAGE=true flag which turns on code coverage checking, and COVERAGE_FILTER=*tabview* which filters out results from files which don’t have “tabview” in their paths. This creates a file called coverage.json in the working directory of the test suite, meaning, for me, obj-ff-dbg/_tests/testing/mochitest/.

This JSON file is a multidimensional array, with file paths and then line numbers as keys. The file paths here, as best as possible, have been converted into local filesystem paths. PhiliKON built a script which produces beautiful reports based on this output.

A word of warning: running with this JSD interruptHook is ridiculously slow. A number of tests for Panorama are timing-dependent (drag-drop tests, for example), making some of them fail, but that’s okay… as long as it completed not via a timeout, it actually did run through all the code. In order to get this to run through everything with some degree of control, I split up the mochitest tabview suite in to a few chunks. I then took the multiple resulting coverage.json files and passed them into another script, in tools/coverage/aggregate.py, which takes multiple JSON results like this and puts them together into a single JSON file. I then passed this aggregate JSON file to PhiliKON’s wonderful report script and—voila—the Panorama test coverage report! Easy as pie.

Happy Halloween!

October 29th, 2010

Happy Halloween from the Firefox Panorama team!

We carved some pumpkins a couple days ago in my department. I carved the Panorama logo above, but also one of the Stata Center.

More Jack-O-Lantern photos, including great ones of Chomsky and Norvin Richards, are up on Flickr.

Voicemail from Jesse

July 3rd, 2010

My friend Jesse left me a voicemail on my Google voice number. Here’s a demo of the fantastic transcription feature.

Voicemail from Jesse from mitcho on Vimeo.

Every website has a purpose

June 2nd, 2010

Every website has a purpose. Maybe you want people to buy a product, donate to your cause, download your app, or subscribe to your mailing list. How can you confidently modify your site to make it more effective with respect to this goal?

A/B testing is a process by which multiple variants of a website are presented to different users randomly and statistical tools are used to see whether any variant is more effective, according to an overall goal metric such as conversions or revenue.

While various A/B testing products—many free—exist, none are made from the ground up to work within the WordPress ecosystem. I believe a solution made particularly for WordPress could make A/B testing so much easier and more straightforward, and that such a solution could be greatly beneficial to the platform as a whole.

I’m happy to announce my new project, code-named ShrimpTest,1 which is directly aimed at filling this void. I’ll be working on this project this summer together with the fantastic folks at Automattic.

The best way to keep up with development is on the project’s development blog, the ShrimpTest P2. Most updates will most likely be much shorter than this initial post. ;) You can get less frequent, milestone-like updates by following ShrimpTest on twitter. Development will be open so feel free to check it out (haha) and submit patches as well. As I go along, I’ll also look forward to your feedback.


  1. Five dollars to the first person to correctly guess why I’m calling it ShrimpTest. 

Better Linguist List RSS Feeds

April 26th, 2010

Everyone I know in linguistics uses the LINGUIST List website to a greater or lesser degree. Linguist List began as a mailing list in the 90’s, with book, job, and dissertation announcements, call-for-papers, and general academic discussions.

Nowadays many people follow the various announcements on Linguist List using an RSS feed reader such as Google Reader or my personal favorite NetNewsWire.

Unfortunately, the Linguist List RSS feeds (at least recently) don’t include the full text of the articles and have a few other quirks as well. It’s often hard to judge based on the title whether it’s really something I’m interested or not, so I’ve spent a lot of time frustratedly opening any possibly interesting-looking entry in a separate NetNewsWire tab. Today I decided enough was enough: I just wrote a script which parses each of the Linguist List RSS feeds, finds the actual descriptions and interleaves them.1 It’s working remarkably well so far:

Read the rest of this entry »


  1. Veteran Linguist List RSS subscribers will also note that I’m adding the full title to the entry title for the Conferences and Calls lists as well. 

Beginning development with Jetpack SDK 0.2

April 14th, 2010

This article is a translation of a recent article in Japanese by fellow Jetpack Ambassador Gomita which was published on the Mozilla Labs Jetpack blog. I’m cross-posting it here for posterity.

Mozilla Labs recently released version 0.2 of the Jetpack SDK, which fixes some issues of the 0.1 release such as a glitch regarding development with Windows. SDK 0.2 doesn’t include the planned APIs for rapid development of new browser functionality, but you can still play with SDK 0.2 to get a flavor for development with the Jetpack SDK.

In this article we begin by setting up an SDK 0.2 development environment and explain the steps required to develop a simple, practical add-on using SDK 0.2. The instructions here are for Windows, but the basic steps are the same in every platform.

Read the rest of this entry »

Spring is for Speaking: JSConf, WordCamp SF, IACL

March 20th, 2010

I recently confirmed three different very exciting speaking gigs which I’ll be doing this spring:

Read the rest of this entry »

Jetpacking in Boston

March 13th, 2010

A couple weeks ago I gave a talk at the Boston Javascript meetup introducing Jetpack and filling people in in the latest developments in the project, including the Reboot. Between 20 to 30 people came to the talk which was at Microsoft Cambridge. Here are the slides from the talk:1

Extend the Browser with Jetpack

Read the rest of this entry »


  1. If anyone would like the Keynote deck, just let me know. 

Jetpack Ambassadors in MV

February 21st, 2010

A couple weeks ago I went out to Mozilla HQ in Mountain View for a Jetpack Ambassador meetup. Jetpack is a project at Mozilla labs intended to make writing Firefox add-ons easier, and shares some ancestry with the Ubiquity project dear to my heart. The Jetpack Ambassadors are a team of Mozilla community members who will be involved with Jetpack marketing, evangelizing Jetpack and writing about our own experiences working with the exciting new Jetpack architecture.

We spent a good chunk of time with a team from Invisible Elephant who came in to give us some training on making technical presentations, and then dug into the code on Day 2. It was great to have the geniuses at Mozilla Labs like Atul and Myk show us the latest Jetpack code as well as get the latest project direction from Daniel, Aza, and Nick, from which we could see the amount of careful consideration and effort that’s gone into the Jetpack reboot.

The best part of the whole experience, though, has to be the fellowship with the other Jetpack Ambassadors. The Ambassadors came from all over the world, encompassing Europe, Asia, S. America, and of course N. America. Each are involved with some really exciting projects and have each made a name for themselves in their respective communities. I’ve put together a twitter list of all the Jetpack Ambassadors and the core team members and invite you to follow them.

We also had the greatest number of Ubiquity core developers to have ever been in the same place at the same time, which of course had to be documented. :)

(More photos can be seen in my gallery.)

I had a fantastic time in MV and it was a shame I could only be there for such a short time. I feel honored to be a part of this group and am looking forward to speaking on Jetpack soon at an event near you!

After the Deadline for Firefox

February 1st, 2010

After the Deadline is a powerful and intelligent proofreading tool which checks for spelling errors, misused words, some grammatical gaffes, and even some stylistic issues. For the past month, I’ve been working for Automattic, the company behind AtD and the makers of WordPress.com, to create a Firefox add-on which enables this superior technology everywhere on the web. Words can’t do justice to the magic that is AtD, so here’s a video we put together:

I invite you all to give it a spin:

add-add-on.png

Working on After the Deadline for Firefox gave me my first experience creating an add-on from the ground up and I’ve learned a lot. After working on Ubiquity and dabbling with Jetpack, it’s given me another perspective on extensibility on the web and I look forward to thinking and writing more about these experiences in the near future.

In the mean time, happy proofreading! :D

WordCamp Boston 2010

January 29th, 2010

4096077627_c6d3035124_o.jpg

This past weekend I gave a couple talks at the inaugural WordCamp Boston. WordCamps are local, community-organized events for WordPress users and enthusiasts. We had about 400 people at the Microsoft Cambridge campus.

Read the rest of this entry »

Creating an image-sized iframe overlay with Shadowbox

January 13th, 2010

I recently have been working with the Shadowbox JavaScript library for an upcoming revision to the MIT Edgerton Digital Collections website. Shadowbox is a nice lightbox library designed to work with various JavaScript libraries like jQuery, prototype, and mootools with a nice modular design.

Shadowbox is organized around different “players”—one for each kind of media that will be displayed. The library by default comes with players for Flash, HTML fragments, iframes, QuickTime, and Windows Media. Some of these players, like those for images and video, automatically recognize the media size and adjust the lightbox accordingly, while others such as the iframe player can use a set size or can fill the screen. For the Edgerton site, though, we had a need for displaying an iframe but in the dimensions of a set image, so that we could display the image with an overlay. Here are some notes on how to implement a custom player for Shadowbox.

Read the rest of this entry »

Disgusting Word-formatted HTML and how to fix it

December 30th, 2009

In working on a new website for the MIT Working Papers in Linguistics, I recently inherited a collection of HTML files with all of our books’ abstracts. To my dismay (but not surprise) the markup in these files were horrendous. Here are some of the cardinal sins of markup that I saw committed in these files:

  1. Confusing ids and classes. ids should be unique on the page… but here’s an instance of using multiple instances of the same id in order to format them together.
<div id="indent"> <div id="number">4.2.1</div> <div id="page">161</div> <div id="section">Old French (Adams 1987)</div>
</div> <div id="indent"> <div id="number">4.2.2</div> <div id="page">164</div> <div id="section">The evolution of the dialects of northern Italy</div>
  1. Putting a class on every instance of something. Everything paragraph should be formatted equivalently. We get the point.
<p class=MsoNormal><b>The English Noun Phrase in Its Sentential Aspect</b></p>
<p class=MsoNormal>Steven Paul Abney</p>
<p class=MsoNormal>May 1987</p>
  1. Using blank space for formatting.
<p class=MsoNormal><o:p>&amp;nbsp;</o:p></p>
  1. CSS styles that don’t exist. Browsers just ignore these anyway…
<p class=MsoNormal>One factor in determining which worlds a modal quantifies
over is the temporal argument of the modal’s accessibility relation.<span
style='mso-spacerun:yes'>  </span>It is well-known that a higher tense affects
the accessibility relation of modals.<span style='mso-spacerun:yes'> 
</span>What is not well-known is that there are aspectual operators high enough
to affect the accessibility relation of modals.<span style='mso-spacerun:yes'> 
</span&gt

The solution

My solution was to write a perl script which takes care of a number of these issues. It’s not foolproof and doesn’t involve any voodoo—for example, it can’t retypeset things which were formatted using whitespace—but it does a good job as a first pass.

You can run the script by making it executable (chmod +x cleanwordhtml.pl) then specifying a target filename as an argument. For example,

./cleanwordhtml.pl source.html > clean.html

I used this with a simple bash for loop to run over all my files:

for f in */*.html; do ./cleanwordhtml.pl $f > ${f%.html}-clean.html; done;

Hopefully someone else can benefit from my experience.


© 2006–2011 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress on Media Temple.
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.