Some of my earliest projects involved the use and development of computational tools for corpus and experimental linguistics, as well as for then-novel forms of human-computer interaction.

Turktools for experimental linguistics

In joint work with Hadas Kotek, I developed turktools, a set of free, open-source tools that allow linguists with little background in programming to post linguistic surveys online, for example on Amazon’s Mechanical Turk crowdsourcing platform. Hadas Kotek and I taught a workshop series on the design of online experiments and the use of these tools at MIT.

The constituency of hypertext corpora

A unique property of the hypertext medium is that authors naturally identify certain stretches of text as units, in the form of inline hyperlinks. What kind of linguistic objects are inline hyperlinks? I have investigated this question through a 5.7M word corpus of English hypertext, which includes 375k links. Manual annotation of a sample of 5,000 links in the corpus found that 94% of inline links are constituents in their host sentences, with the exceptions largely clustering into other kinds of linguistic objects, e.g. constituents modulo a large adjunct not included at the right edge. I am pursuing the hypothesis that hyperlinking is a process constrained by the author’s knowledge of the text’s underlying linguistic structure, and therefore could ultimately be used as a test for syntactic constituency, once we better understand the possible range of exceptions.

Multilingual command parsing for Ubiquity

In 2009, I worked as a researcher at Mozilla, the organization behind the Firefox web browser. I contributed to the development of Ubiquity, an open-source, text-based natural language interface for the browser. I led the design and development of a lightweight parser which could then be easily localized by the community.

Return to all projects