mitcho Michael 芳貴 Erlewine

Linguist. Fifth year PhD student at MIT.


Posts Tagged ‘algorithm’

Exploring Command Chaining in Ubiquity: Part 1

Wednesday, August 19th, 2009

Since the dawn of time people have been asking about command chaining in Ubiquity. If you have a translate command and an email command, it would be great to be able to, for example, translate hello to Spanish and email to Juanito. This is what we call command chaining or [[Pipeline_(Unix)|piping]]: in a single complex query, specifying multiple (probably two) actions and using the first’s output as the second’s input.1

Today I hope to cover some of the technical considerations required in implementing command chaining in Ubiquity, and I will follow up soon with a blog post on the linguistic considerations required as well.


  1. We’re going to limit our discussion here to this restriction that the two verbs are not simply two simultaneous commands, but two commands which operate successively on an input, i.e., that it is true piping. This for example rules out input such as google dogs and translate cat to Spanish, as the second command’s execution does not semantically depend on the first’s execution. This (hopefully uncontroversial) decision also affects the linguistic considerations to be made in my next post. 

Nountype Quirks: Day 3: Geo Day

Saturday, August 1st, 2009

It’s time for one more installment of Nountype Quirks, where I review and tweak Ubiquity’s built-in nountypes. For an introduction to this effort, please read Judging Noun Types and my updates from Day 1 and Day 2.

Today I ended up spending most of the day attempting to implement (but not yet completing) major improvements to the geolocation-related nountypes whose plans I lay out here.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site. (more…)

Nountype Quirks: Day 2

Thursday, July 30th, 2009

Today I’m continuing the process of reviewing and tweaking all of the nountypes built-in to Ubiquity. For a more respectable introduction to this endeavor, please read my blog post from a couple days ago, Judging Noun Types and my status update from yesterday, Nountype Quirks: Day 1.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.


Nountype Quirks: Day 1

Wednesday, July 29th, 2009

Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I’ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I’ll document some of the nountype quirks I’ve discovered so far and what remedy has been implemented or is planned.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.


Scoring for Optimization

Friday, April 24th, 2009

Suppose you have a number of competing candidates, each of which can be ranked with a score, but it takes a little time to calculate each candidate’s score. You’re only interested in the top [latex]n[/latex] candidates. You want to come up with a scoring scheme where you can throw the extra candidates out of consideration earlier without sacrificing quality. Such is the problem of scoring and ranking suggestions in Ubiquity. What properties must such a scoring system have?

This blog post includes a lot of complex CSS-formatted graphs which may be best viewed in — what else? — Firefox. You may also want to access this blog post directly rather than through a planet.

candidate 8  
candidate 2  
candidate 9  
candidate 3  
candidate 10 CUTOFF
candidate 5 
candidate 1 
candidate 7 

One portion of the problem description above merits clarification: I define “without sacrificing quality” to mean that, if we did not throw out any candidates early and waited until all the scores are computed fully and accurately, we would still yield the same top [latex]n[/latex] winners. This already gives us the key insight towards an appropriate solution: we can only throw out candidates when we know that it has no further chance of making it up into top [latex]n[/latex] candidates.


Scoring and Ranking Suggestions

Tuesday, April 7th, 2009

I just spent some time reviewing how Ubiquity currently ranks its suggestions in relation to to Parser The Next Generation so I thought I’d put some of these thoughts down in writing.

The issue of ranking Ubiquity suggestions can be restated as predicting an optimal output given a certain input and various conflicting considerations. Ubiquity (1.8, as of this writing) computes four “scores” for each suggestion:


Ubiquity Parser: The Next Generation Demo

Wednesday, March 18th, 2009


A week or two ago while visiting California, Jono and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in Ubiquity Parser: The Next Generation. The new architecture is designed to support (1) the use of overlord verbs, (2) writing verbs by semantic roles, and (3) better suggestions for verb-final languages and other argument-first contexts. I’m happy to say that I’ve spent some time putting a proof-of-concept together.

I’ve implemented the basic algorithm of this parser for [[left-branching]] languages (like English) and also implemented some fake English verbs, noun types, and semantic roles. This demo should give you a basic sense of how this parser will attempt to identify different types of arguments and check their noun types even without clearly knowing the verb. This should make the suggestion ranking much smarter, particularly for verb-final contexts. (For a good example, try from Tokyo to San Francisco.)

➔ Check out the Ubiquity next-gen parser demo


External orders in WordPress queries

Saturday, November 29th, 2008

The advanced WordPress user is intimately familiar with query_posts, the function which controls which posts are displayed in “The Loop.” query_posts gives plugin and theme writers the ability to display only posts written in Janary (query_posts("monthnum=1")) or disallow posts from a certain category (query_posts("cat=-529")1). One of the parameters you can set here is orderby which affects the ordering of the posts returned, with allowed values such as author, date, or title. But what if you want to order your posts in some other order, defined outside of your wp_posts table? Here I’m going to lay out some thoughts on rolling your own external ordering source for WordPress queries.

In order to introduce an external ordering source, we need to do four things: 1. create the external ordering source, 2. hook up (read “join”) the external ordering source 3. make sure we use that order, and 4. make it play nice. ^^

By the way, I’m going to assume you, dear reader, are PHP-savvy, proficient in MySQL, and already know a little about WordPress. This how-to is not for the PHPhobic.


  1. This, incidentally, is precisely what I do to hide, by default, my tweets in my index.php and archives.php

Yet Another Related Posts Plugin 2.0

Sunday, July 13th, 2008

Well, it’s been a while since I updated my plugin YARPP—in my humble opinion the best related posts plugin for WordPress. ^^ Today I release version 2.0, incorporating a number of important requests and bug fixes:

  • New algorithm which considers tags and categories, by frequent request
  • Order by score, date, or title, by request
  • Excluding certain tags or categories, by request
  • Sample output displayed in the options screen
  • Bugfix: an excerpt length bug
  • Bugfix: now compatible with the following plugins:
    • diggZEt
    • WP-Syntax
    • Viper’s Video Quicktags
    • WP-CodeBox
    • WP shortcodes

Check out the Yet Another Related Posts Plugin page on this site, the page on, or download it directly now!

Yet Another Related Posts Plugin

Saturday, December 29th, 2007


This posting is now outdated… for the latest information on YARPP, please visit YARPP’s very own page on my site, or its page on If you have questions, please submit on the forum. Thanks!


Today I’m releasing Yet Another Related Posts Plugin (YARPP1) 1.0 for WordPress. It’s the result of some tinkering with Peter Bowyer’s version of Alexander Malov & Mike Lu’s Related Entries plugin. Modifications made include:

  1. Limiting by a threshold: Peter Bowyer did the great work of making the algorithm use [[mysql]]’s fulltext search score to identify related posts. But it currently just displayed, for example, the top 5 most “relevant” entries, even if some of them weren’t at all similar. Now you can set a threshold limit2 for relevance, and you get more related posts if there are more related posts and less if there are less. Ha!
  2. Being a better plugin citizen: now it doesn’t require the user to click some sketchy button to alter the database and enable a fulltext key. Using register_activation_hook, it does it automagically on plugin activation. Just install and go!
  3. Miscellany: a nicer options screen, displaying the fulltext match score on output for admins, an option to allow related posts from the future, a couple bug fixes, etc.


Just put it in your /wp-content/plugins/ directory, activate, and then drop the related_posts function in your WP loop. Change any options in the Related Posts (YARPP) Options pane in Admin > Plugins.

You can override any options in an individual instance of related_posts using the following syntax:

`related_posts(limit, threshold, before title, after title, show excerpt, len, before excerpt, after excerpt, show pass posts, past only, show score);

Most of these should be self-explanatory. They’re also in the same order as the options on the YARPP Options pane.

Example: related_posts(10, null, 'title: ') changes the maximum related posts number to 10, keeps the default threshold from the Options pane, and adds title: to the beginning of every title.

There’s also a related_posts_exist) function. It has three optional arguments to override the defaults: a threshold, the past only boolean, and the show password-protected posts boolean.


For a barebones setup, just drop <?php related_posts(); ?> right after <?php the_content() ?>.

On my own blog I use the following code with <li> and </li> as the before/after entry options:

<?php if (related_posts_exist()): ?>
<p>Related posts:
<?php related_posts(); ?>
<?php else: ?>
<p>No related posts.</p>
<?php endif; ?>

Coming soon (probably)

  1. Incorporation of tags and categories in the algorithm. I’ve gotten the code working, but I still need to think about what the most natural algorithm would be for weighing these factors against the mysql fulltext score currently used (and works pretty well, I must say).
  2. Um, something else! Let me know if you have any suggestions for improvement. ^^

Version log

1.0 Initial upload (20071229)

1.0.1 Bugfix: 1.0 assumed you had Markdown installed (20070105)

  1. Pronounced “yarp!”, kind of like this, but maybe with a little more joy:

  2. Did you know that threshold has only two h’s!? I’m incensed and just went through and replaced all the instances of threshhold in my code. It’s really not a thresh-hold!?