blog

Posts Tagged ‘code’

The 北京话儿 Beijing Pirate T-shirt

Saturday, June 14th, 2008

Speaking of t-shirts, I’d been toying with a t-shirt idea for the past year or two: a Beijing Pirate t-shirt. Let me explain…

A distinctive feature of Beijing dialect of Mandarin (and, indeed, most northern Chinese dialects) is the very frequent rhoticization (adding to or replacing the end of a word with “arr”) whose function is often glossed as a diminutive suffix. This phenomenon is called 儿化 (érhùa) in Chinese. Here are some examples, blatantly stolen from Wikipedia:

  • 公园(gōngyuán)(public garden) → 公园儿(gōngyuánr), pronounced “gōngyuár”
  • 小孩(xiǎohái) (small child) → 小孩儿(xiǎoháir), pronounced “xǐaohár”
  • 事 (shì) (thing) → 事儿(shìr), pronounced “shèr”

The result of this variation is that it makes you sound like a pirate… and thus my t-shirt idea was born:

Beijing Pirate shirt

(more…)

Markdown for WordPress and bbPress

Wednesday, May 21st, 2008

I like many others am a big fan of John Gruber’s Markdown, a simple typesetting spec for entering text in a clean, legible plain-text fashion and outputting to (X)HTML. Michel Fortin did the fabulous job of porting the Markdown engine to PHP, making it a plugin for WordPress, bBlog, and TextPattern.

I’ve been using Markdown for all my blog posts here. Recently, though, I was in charge of a bbPress bulletin board (the “less is more” sister project to WordPress) for the Shoreland Scav Hunt team, and wanted to use Markdown formatting there. And I wasn’t the only one wanting to do this.

With some experimenting and research into the filters in the bbPress text flow (different than the WordPress one), I was able to make Markdown work in bbPress. This involved adding a special bbPress plugin wrapper to Michel Fortin’s PHP Markdown Extra. I’ve rereleased this plugin as Markdown for WordPress and bbPress, available at both wordpress.org and bbpress.org. Enjoy!

Testing Google’s Language Detection

Saturday, May 17th, 2008

google code

As Google adds ten more languages to its machine translation service, it seems to be on its way to becoming the most convenient universal translator of the world’s popular languages. Google’s handling of languages of course isn’t perfect, however—in particular, I’ve been complaining to friends for a while about the weaknesses of Google’s handling of queries in Chinese character (漢字/汉字) scripts. In this post, I run some tests using Google’s Language Detection service to try to better understand its handling of Chinese character queries.

Background

Chinese characters have been used all across East Asia, most notably in Chinese, Japanese, Korean, and Vietnamese (the “CJKV”). Prescriptivist writing reforms in Communist China and Japan have simplified many characters, though. Some characters were simplified in the same way, some in different ways, and some in only one country but not the other. For more information, there’s Wikipedia or Ken Lunde’s CJKV Information Processing.

The problem

The issue comes up when you try to search for a word in Chinese characters which clearly came from one Chinese character-using language. From my experience, Google doesn’t consider which language you are a user of, based on the query, and returns many results in other Chinese character-using languages as well.[^1]

(more…)

Scav Hunt!

Thursday, May 8th, 2008

Introduction

It’s that time of the year again—Mother’s Day weekend—and that means Scav Hunt! Every year at the University of Chicago we have a huge Scavenger Hunt (a.k.a. “Scav,” or “The Hunt”). On Wednesday night at midnight, a list of roughly 300 items is released in some obfuscated fashion. The items are to be presented three days later, on Judgement Day (Sunday). While some items are simply rare and must be found, most are some sort of construction, production, or art project. There are also some other scav staples: some of the items make up the Scav Olympics, the Party on the Quads, Scav All Stars, and the Road Trip.

(more…)

Display your Last.fm rankings using PHP 4’s XSLT support

Friday, February 1st, 2008

With all the exciting recent news about Last.fm, I thought I would document a simple bit of code I added to my site the other day.

Last.fm offers a number of Flash-based widgets you can add to your website. Unfortunately, this doesn’t give you much flexibility and, of course, requires Flash. But you, dear friend, have a site written in PHP, and the rankings are just XML files. There is a better way.

Looking around on the web, there are some good instructions and recommendations for using PHP 5’s object-oriented XML support. But, as we know, not everyone is using PHP 5. Here’s what I did on my PHP install, which includes the DOM/XML and DOM/XSLT extensions.1

Write your XSL Transformation

The first step is to write your XSL Transformation, or XSLT, a special XML “program” which takes an XML file and reformats it into another XML file. Remember when you learned in Algebra class what a function was? An XSLT defines a function from XML to XML. In our case, we need to take a special proprietary XML file like the this one for my weekly top artists and return some solid XHTML.

Let’s take a look at the XSLT I used: (it helps to take a look at the original XML file at the same time.)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="weeklyartistchart">
<ol>
<xsl:apply-templates select="artist[position() &lt; 6]"/>
</ol>
</xsl:template>
<xsl:template match="artist">
<li><a><xsl:attribute name="href"><xsl:value-of select="./url"/></xsl:attribute><xsl:value-of select="./name"/></a><span><xsl:value-of select="./artist"/></span>: <xsl:value-of select="./playcount"/></li>
</xsl:template>
</xsl:stylesheet>

The full spec description will give you all the juicy details, but you really only need a few basic details (or you can just steal my code). First, we note the <xsl:template match="weeklyartistchart"> and <xsl:template match="artist"> items. Each of these code blocks describe what to do to each <weeklyartistchart> and <artist> nodes, respectively, in the input XML. In the code above, when a <weeklyartistchart> is found, an ordered list is opened and the artist template is applied to the first five (position() &lt; 6) <artist> nodes. In the artist template (<xsl:template match="artist">), the script takes each <artist> and prints a list item with the name of the artist, with a link to the url given in the original <artist>’s <url> subnode.

Process the XML with your XSLT

Once your XSLT is written, save it to a file, like last.fm.xml. Now we’ll use the DOM/XSLT extension and apply this XSLT file to the live weekly artist chart XML file from last.fm. Here’s the code I used:

$chartxml = domxml_open_file("/weeklyartistchart.xml");
$xslt = domxml_xslt_stylesheet_file("last.fm.xsl");
$charthtml = $xslt->process($chartxml);
echo $charthtml->dump_mem();

The code is pretty self explanatory—just four lines: 1. open the remote XML file using domxml_open_file, 2. open the stylesheet (XSL transformation), 3. apply the stylesheet to the remote XML file, and 4. echo the output.

That’s it! You can see the results here on my music page.


  1. To see if these instructions will work for you, check your phpinfo for the lines “DOM/XML enabled” and “DOM/XSLT enabled”. If the items aren’t even showing up, you’re out of luck. :( There are, however, other comparable methods to process XML and XSLT in PHP 4.
    dom/xml check 

Modifiying WordPress plugin activation behavior

Saturday, January 5th, 2008

As I continue to work on and debug Yet Another Related Posts Plugin and WP-Smartdate, I’ve come across an issue where plugin activation fails, but I get no useful error message.

When I try to activate the plugin, I am redirected to a url of the type /plugins.php?error=true&plugin=...&_error_nonce=.... This redirect just gives me the plugins control panel with my plugin still disactivated, and with no useful error message.1 This apparently is an issue with the Plugin Protection mechanism introduced in WP 2.2. A quick fix (hack) is available on the WP forums.

Here’s hoping this helps some people scratching their heads, and that this behavior is reconsidered/fixed in future releases.


  1. Apparently some people get a message like “Plugin could not be activated because it triggered a fatal error.” 

Yet Another Related Posts Plugin

Saturday, December 29th, 2007

UPDATE:

This posting is now outdated… for the latest information on YARPP, please visit YARPP’s very own page on my site, or its page on wordpress.org. If you have questions, please submit on the wordpress.org forum. Thanks!

Description

Today I’m releasing Yet Another Related Posts Plugin (YARPP1) 1.0 for WordPress. It’s the result of some tinkering with Peter Bowyer’s version of Alexander Malov & Mike Lu’s Related Entries plugin. Modifications made include:

  1. Limiting by a threshold: Peter Bowyer did the great work of making the algorithm use mysql’s fulltext search score to identify related posts. But it currently just displayed, for example, the top 5 most “relevant” entries, even if some of them weren’t at all similar. Now you can set a threshold limit2 for relevance, and you get more related posts if there are more related posts and less if there are less. Ha!
  2. Being a better plugin citizen: now it doesn’t require the user to click some sketchy button to alter the database and enable a fulltext key. Using register_activation_hook, it does it automagically on plugin activation. Just install and go!
  3. Miscellany: a nicer options screen, displaying the fulltext match score on output for admins, an option to allow related posts from the future, a couple bug fixes, etc.

Installation

Just put it in your /wp-content/plugins/ directory, activate, and then drop the related_posts function in your WP loop. Change any options in the Related Posts (YARPP) Options pane in Admin > Plugins.

You can override any options in an individual instance of related_posts using the following syntax:

`related_posts(limit, threshold, before title, after title, show excerpt, len, before excerpt, after excerpt, show pass posts, past only, show score);

Most of these should be self-explanatory. They’re also in the same order as the options on the YARPP Options pane.

Example: related_posts(10, null, 'title: ') changes the maximum related posts number to 10, keeps the default threshold from the Options pane, and adds title: to the beginning of every title.

There’s also a related_posts_exist) function. It has three optional arguments to override the defaults: a threshold, the past only boolean, and the show password-protected posts boolean.

Examples

For a barebones setup, just drop <?php related_posts(); ?> right after <?php the_content() ?>.

On my own blog I use the following code with <li> and </li> as the before/after entry options:

Related posts:

No related posts.

Coming soon (probably)

  1. Incorporation of tags and categories in the algorithm. I’ve gotten the code working, but I still need to think about what the most natural algorithm would be for weighing these factors against the mysql fulltext score currently used (and works pretty well, I must say).
  2. Um, something else! Let me know if you have any suggestions for improvement. ^^

Version log

1.0 Initial upload (20071229)

1.0.1 Bugfix: 1.0 assumed you had Markdown installed (20070105)


  1. Pronounced “yarp!”, kind of like this, but maybe with a little more joy:
     

  2. Did you know that threshold has only two h’s!? I’m incensed and just went through and replaced all the instances of threshhold in my code. It’s really not a thresh-hold!? 

dvipng color trouble

Friday, December 28th, 2007

I still have yet to find a fix to the dvipng discoloration mystery I ran into back at The Academic Approach, even with the latest MacTeX version, so I’m going to repost the problem here.

In May 2007, I wrote the following to the OS X TeX listhost:1

Hi all,

I’ve recently run into what I believe is a rare bug in dvipng: here’s the setup. (To play along, you can get my test files: http://mitcho.com/discolor.zip .) I am using MacTeX… in fact, it’s today’s release.

The LaTeX source file (discolored.tex) loads just two packages: color and graphicx. The body does two things: an \includegraphics with a local PNG file (with the bb option to specify the BoundingBox explicitly) and a \textcolor command introducing some green text, using the green defined there.

pdflatex produces the expected result: the figure and the green text.

Original color

But when you run the following commands…

latex discolored.tex

dvipng -D 200 discolored.dvi

you get a PNG (discolored1.png) which shows the text in a brownish color… the green is gone!!

Discolored version

There are two quick ways to fix this that I’ve found: one is to not include the image… if you comment the \includegraphics command out, the color comes out fine. The second is to not specify a -D (output resolution) parameter in the dvipng… this also gives you the expected output. However, in my current project, neither of these are available options…

I am frankly not very familiar with the inner workings of dvipng… does anyone have any thoughts? Can this bug be reproduced?

Any hints or suggestions are welcome!

Ditto.

discolor.zip
496 kb - zip

  1. Inline cropped images added to this web version, just to spice things up. 

Introducing Smartdate

Tuesday, November 27th, 2007

I recently have been working on a WordPress plugin called WP-Smartdate and I’m happy to say that it is hosted at wordpress.org starting today. As some people have noticed, my blog recently has included little links on word like “yesterday,” with a machine readable version of the date reference (called a “microformat” in the biz). Download the plugin and get started!

WP-Smartdate 0.1
4 kb - zip

This blog post describes release 0.1… For the latest description, check out the WP-Smartdate plugin page or mitcho.com/code.

Please comment! I would love to hear your feedback on the plugin.

Description

WP-Smartdate looks for relative date expressions in your blog posts, such as “tomorrow,” “this coming Monday,” “last Friday,” and adds the date reference (like “2007-11-26”) as a machine-readable microformat.

Why Smartdate?

WP-Smartdate was created for three simple audiences:

  1. For the machine: While many professional information retrieval algorithms go far beyond the scope of this program, smartdate helps the process along by adding machine-readable tags to relative date expressions.1 In addition, these machine tags, in turn, help the human: a search on Google for “November 7th, 2007,” for example, will not pull up a document talking about “yesterday,” written on the 8th, but it will pull up the smartdate output of 2007-11-07.
  2. For the human reader: Blog posts are often written in the “now,” using relative time expressions without concern for how the text will be read in the future. WP-Smartdate makes such posts easier to read and comprehend temporally2.
  3. For me: Because I think this sort of thing is fun!

A typology of smartdate date expressions

The following types of expressions are resolved with respect to the speech time—in WP-Smartdate’s case, the blog post date.

  1. simple references: “yesterday,” “today,” “tomorrow,”
  2. next/last DOTW expressions: “next Friday,” “this past Sunday,” “this Monday

For the future

  • static dates: “January 1st, 2007”
  • duration shift expressions: “5 days ago,” “fourscore and seven years ago”
  • day of the week shifts: “2 Fridays ago”
  • clean up the code!

  1. One could argue that relative dates are a perfect place to use the abbr tag, as they are a sort of natural-language shortcut for more static temporal expressions. In fact, WP-Smartdate’s output also follows the datetime microformat design pattern draft with two caveats: 1. Unfortunately, the datetime semantic class has not yet been set as the standard is a draft. WP-Smartdate uses datetime. See the Date and Time datatype proposal for more information. 2. The current recommendation for datetime pushes for following the W3C datetime profile, which does not support the ISO-8601 time interval specification, which is will be used by WP-Smartdate. 

  2. Even though the abbr tag should only be used for machine reading

The Nerd Handbook

Monday, November 12th, 2007

From Rands in Repose’s Nerd Handbook, probably a good guide for Bailey (though I don’t quite fit the target completely):

But in nerds’ bit-based work, progress is measured mentally and invisibly in code, algorithms, efficiency, and small mental victories that don’t exist in a world of atoms.

I feel this phenomenon exists in formal linguistics as well, where the elegance of an analysis may be measured in theory-internal terms. It’s hard to get other people excited when they don’t share that same background, precisely as there is no physical manifestation of an analysis. At least Bailey’s good about listening, trying to understand, and being happy for me. ^^

(via Daring Fireball)

Updating your zenphoto theme for zenphoto 1.1

Sunday, November 4th, 2007

I use zenphoto as the backend to my photos section with a custom theme to hook into my site’s navigation and such. I chose zenphoto for my website a year ago based on it’s main strength: simplicity. It does much less than the competition, but it does what I need it to do—for the most part. It’s a fantastic bare-bones mysql/php photo gallery option.

Since then, though, I (along with many others) have been slightly disappointed by the lack of development in the promising project, without having the time or energy to pitch in myself. Such is life. But now the wait is over: Zenphoto 1.1 is out.

Zenphoto 1.1, I believe, does a good job balancing this tradition of simplicity with some popular new features. Highlights include (there are many) tagging, subalbums, chronological archives, RSS feeds, EXIF support, Google Maps, search, and preliminary video support. Exciting stuff.

As I maintain my own theme, though, some of these new features of course require me to update my theme. Below is my rough guide to editing your theme to take maximum advantage of zenphoto 1.1.

(more…)

Mailplane Japanese localization available!

Tuesday, September 4th, 2007

Mailplane, the fabulous Mac OS X Gmail client, just reached 1.5.1, including my Japanese localization. My “exotic” (read “non-Roman”) preferences were even featured on the website! While Mailplane is currently in private beta, I have some invites so let me know if you want in. The localization still has a number of quirks, so I would appreciate any feedback along those lines, too.

UPDATE: Also covered in other blogs