Ubiquity Parser: The Next Generation Demo
A week or two ago while visiting California, Jono and I had a productive charrette, resulting in a new architecture proposal for the Ubiquity parser, as laid out in Ubiquity Parser: The Next Generation. The new architecture is designed to support (1) the use of overlord verbs, (2) writing verbs by semantic roles, and (3) better suggestions for verb-final languages and other argument-first contexts. I’m happy to say that I’ve spent some time putting a proof-of-concept together.
I’ve implemented the basic algorithm of this parser for left-branching languages (like English) and also implemented some fake English verbs, noun types, and semantic roles. This demo should give you a basic sense of how this parser will attempt to identify different types of arguments and check their noun types even without clearly knowing the verb. This should make the suggestion ranking much smarter, particularly for verb-final contexts. (For a good example, try from Tokyo to San Francisco.)
➔ Check out the Ubiquity next-gen parser demo
Clicking on the environment info will give you some information on the specific verbs, noun types, and roles implemented. You can also scroll through the current parse section to see the step by step derivation of how the suggested parses were constructed.
I’ll be flying about 15 hours in the next hour as I make my way back to Japan… hopefully I’ll make some more progress on the plane! I’ll look forward to your comments! For those of you interested in checking out the code yourself, you can find it on BitBucket.
Related posts:
- This week on Ubiquity Parser: The Next Generation
- Foxkeh demos Ubiquity Parser: The Next Generation
- Ubiquity Commands by The Numbers
- A Demonstration of Ubiquity Parser 2
- Adding Your Language to Ubiquity Parser 2
Related posts brought to you by Yet Another Related Posts Plugin.
Tags: algorithm, arguments, California, code, interface, javascript, Mozilla Planet, overlord verbs, parser, photo, proposal, semantic role, ubiquity, verb-final, verbs
If you enjoyed this post, make sure you subscribe to my RSS feed (optionally with tweets from my Twitter)!

March 18th, 2009 at 6:29 pm
This is freaking awesome.
March 27th, 2009 at 6:34 am
[…] Last week I released a proof-of-concept demo of the next generation Ubiquity parser design and it was also the focus of discussion in our weekly internationalization meeting.1 Christian Sonne even wrote a Danish plugin for it during the meeting—a testament to the pluggability and of the new parser design. […]
April 1st, 2009 at 8:01 pm
Very cool.
I have a few comments, though. The "move" command seems to have issues. The portugese example doesn't get picked up, and neither does the Spanish version I'm working on, and curiously, both Portuguese and Spanish use "ir". I wonder, is "ir" too short? And the longer portugese version "vai" does not get picked up either.
Also, for the "goal" and "time" roles, I can't seem to be able to use multiple words. A very common goal role could be "a la". And the time role is always either "a las" or "a la", never a single word. Being restricted to a single word for the roles doesn't feel natural at all in Spanish.
Thanks!!!
April 2nd, 2009 at 2:17 am
Alejandro—thanks for the comments.
Re: move… I noticed that yesterday as well while making my demo video. I'm going to try to figure that out today.
Re: "a la"… yes, this is definitely something we need to allow. My immediate response would be that maybe we can join these instances into "ala" or "a-la" in the wordBreaker method and then deal with them as single words later, but then that doesn't allow having both the parse where that goal is used and the parse where that goal is not parsed… I think the best thing to do is to build multi-word delimiter support into the argFinder method. I'll work on that today as well.
Thanks for bringing these issues up!
April 2nd, 2009 at 3:50 am
Alejandro - the move verb is working now… it was because it wasn't taking an object agument before and there was a bug that didn't allow suggestions for verbs which didn't take an object argument… that's fixed now, but I also added the object argument back into the move verb.
April 2nd, 2009 at 6:41 am
Alejandro, upon further consideration, I'm not convinced that multi-word delimiters are necessary. In the case of "a la" or "a las" in Spanish, we probably would also want simply "a" as an equivalent delimiter, in which case we can write a rule to strip off the "la" or "las" from the argument when they are assigned to roles. This'll cut down on some huge complications to the parser we would get by trying to support multi-word delimiters.
That said, perhaps there are cases with other languages where multi-word delimiters are seriously required… the main test would be a case where one delimiter is the substring/subword of another delimiter, and they are for different roles, so that they're in competition. As long as we don't have that, though, I think we're safe for the time being.
I added a new method to Parser called cleanArgument which is called whenever an argument is assigned to a role… you can override this method in the Spanish parser so that it strips "la" and "las" off the beginning of arguments. Give that a try.
April 2nd, 2009 at 4:00 pm
Yeah … that could work.
I'll try it out later today.
I was also thinking that maybe the role could be "a-la" and the parser could capture "a la" and identify it as that role. I think the current Ubiquity does something like that? That way one could type: "agr meeting al calend a las 2pm"
And the parser would read it as: "agregar meeting al calendario a-las 2pm"
Would that be possible?
Anyway, I guess grammatically (linguistically?) the words la, los, las, etc. aren't really part of the role, so the cleanArgument may be a more solution.
Thanks!
April 3rd, 2009 at 2:20 am
re: "a-las", this is the hack i was thinking about doing and, if we do have a real need for multi-word delimiters, this may be what we want to do.
The issue is that we then need to explode the number of possible argStrings before sending it to argFinder, as we don't want to only get parses where "a-las" is used as a delimiter. We may also want the parse which takes "a" as a delimiter and actually sends "las" to the argument, for example, so we'd have to give both "agr meeting al calend a las 2pm" and "agr meeting al calend a-las 2pm" to argFinder. If there are ever multiple multi-word delimiters in an input, this juggling would add an additional layer of recursion to the parser which I'd rather avoid.
April 9th, 2009 at 7:07 am
[…] and I have recently been working to incorporate the Parser The Next Generation into Ubiquity proper, and this of course involves the process of retooling the standard commands […]
April 9th, 2009 at 10:34 pm
I got it! I had to strip stuff like "la " with a space so that it would work. I just mention it because I sort of didn't expect it, and it's what gave me a bit of trouble =o)
Do you still need me to send you a diff?
April 10th, 2009 at 2:41 am
Yes, please send me a diff. You can put it up on github or a pastebin or something.
October 14th, 2009 at 1:32 pm
Try to surf search engines, perhaps you will find some useful information there.