Corpus and computational projects
Some of my earliest projects involved the use and development of computational tools for corpus and experimental linguistics, as well as for then-novel forms of human-computer interaction.
Turktools for experimental linguistics
In joint work with Hadas Kotek, I developed turktools, a set of free, open-source tools that allow linguists with little background in programming to post linguistic surveys online, for example on Amazon’s Mechanical Turk crowdsourcing platform. Hadas Kotek and I taught a workshop series on the design of online experiments and the use of these tools at MIT.
-
Erlewine and Kotek, 2016.
“A streamlined approach to online linguistic surveys.”
Natural Language & Linguistic Theory 34:2, pages 481–495. Tools available at turktools.net. DOI: 10.1007/s11049-015-9305-9
The constituency of hypertext corpora
A unique property of the hypertext medium is that authors naturally identify certain stretches of text as units, in the form of inline hyperlinks. What kind of linguistic objects are inline hyperlinks? I have investigated this question through a 5.7M word corpus of English hypertext, which includes 375k links. Manual annotation of a sample of 5,000 links in the corpus found that 94% of inline links are constituents in their host sentences, with the exceptions largely clustering into other kinds of linguistic objects, e.g. constituents modulo a large adjunct not included at the right edge. I am pursuing the hypothesis that hyperlinking is a process constrained by the author’s knowledge of the text’s underlying linguistic structure, and therefore could ultimately be used as a test for syntactic constituency, once we better understand the possible range of exceptions.
-
2011.
“The Constituency of Hyperlinks in a Hypertext Corpus.”
Presented at the International Society for the Linguistics of English (IsLE 2), Boston University.
Multilingual command parsing for Ubiquity
In 2009, I worked as a researcher at Mozilla, the organization behind the Firefox web browser. I contributed to the development of Ubiquity, an open-source, text-based natural language interface for the browser. I led the design and development of a lightweight parser which could then be easily localized by the community.
-
2009.
“Ubiquity: Designing a Multilingual Natural Language Interface.”
Proceedings of the ACM SIGIR 2009 Workshop on Information Access in a Multilingual World, pages 45–48.