Nountype Quirks: Day 1

Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I’ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I’ll document some of the nountype quirks I’ve discovered so far and what remedy has been implemented or is planned.

Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.

noun_type_percentage

Here’s what a few different inputs originally returned:

input suggestion tuner-top.png
20 20% 1
20% 20% 1
0.2 20% 1
0.2% 20% 1
20.0 2000% 1
2 hens in the garden 2% 1

Let me highlight a couple obvious quirks:

  1. In certain cases, where the numerical expression includes a decimal and is less than one, it is interpreted as a proportional, rather than percent, value, e.g. “0.2” → “20%”. “0.2%” is not even an option. This is the case even when explicitly adding a % sign.
  2. All suggestions, including those where the numeral was extracted from a long string of text (e.g. “2 hens in the garden”), get the same score of 1.

I just committed a fix so noun_type_percentage now…

  1. Counts the number of characters in the input which match [\d.%] and caps the score by (number of acceptable characters)/(length of input).
  2. Strings which do not include “%” get a 10% penalty.
  3. In the case of decimals less than 1 without a % sign, the proportion interpretation is also suggested (e.g. “0.2” → “20%”) in addition to the original suggestion (“0.2%”), but with a slight penalty.

Here is what they now return:

input suggestion tuner-top.png
20 20% 0.9
20% 20% 1
0.2 0.2% 0.9
20% 0.81
0.2% 0.2% 1
20.0 20% 0.9
2 hens in the garden 2% 0.05

noun_type_tag

Here’s what a few different inputs originally returned. Keep in mind that currently in this test profile, the preexisting tags are “animal”, “help”, “test”, and “ubiquity”.

input suggestion tuner-top.png
animal animal 0.3
mineral mineral 0.3
anim animal 0.7
anim 0.3
help, test, ubiq help,test,ubiquity 0.7
help,test,ubiq 0.3
google, yahoo, ubiq google,yahoo,ubiquity 0.7
google,yahoo,ubiq 0.3
google, , yahoo google,yahoo 0.3

Here are a few of noun_type_tag’s quirks:

  1. There are only two scores ever given out: 0.3 and 0.7.
  2. Only the last tag in the list and whether it exists or not is taken into account.
  3. When the last tag is incomplete, the completion is suggested with a higher score, but if the last tag is exactly equal to an existing tag, it gets the lower score.

Ideally, we want noun_type_tag to look at each of the tags given to it, with higher scores for when there are more preexisting tags and fewer new ones. Keep in mind, though, that we only have to suggest the completion of the very last tag as that may be one where the user hasn’t completed typing yet… for earlier tags, we can assume (safely or not) that the user placed the comma where they meant to. We can’t teach Ubiquity to read minds, after all.1

With this in mind, I just made a change to noun_type_tag which aims to follow these principles. The basic idea is that we start with a base score of 0.3 but then raise it via [[nth root nth root]] for every tag in the sequence which is preexisting. Here’s what the same inputs return now. Recall that the preexisting tags are “animal”, “help”, “test”, and “ubiquity”.
input suggestion tuner-top.png
animal animal 0.55
mineral mineral 0.3
anim animal 0.55
anim 0.3
help, test, ubiq help,test,ubiquity 0.86
help,test,ubiq 0.74
google, yahoo, ubiq google,yahoo,ubiquity 0.55
google,yahoo,ubiq 0.3
google, , yahoo google,yahoo 0.3

noun_type_awesomebar

input suggestion tuner-top.png
moz http://www.mozilla.com/   0.8
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial   0.8
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official   0.8
http://en-us.www.mozilla.com/en-US/firefox/about/   0.8

There are a couple quirks here:

  1. All suggestions are returned with the same scores.
  2. The nountype returns the URL of the entry as the HTML-formatted result and the title as the text-formatted result, which clearly does not make sense. However, it’s not clear to me whether the title, URL, or some combination of both is what we should be returning as the suggestion text presented to the user.2

I just rewrote noun_type_awesomebar to actually do some differential scoring. This new version also presents the URL or title depending on whichever had a better match using the matchScore function.3

input suggestion tuner-top.png
moz www.mozilla.com   0.7
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial   0.63
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official   0.61
http://en-us.www.mozilla.com/en-US/firefox/about/   0.6

noun_type_url

The purpose of noun_type_url’s suggest function is two-fold: first, to accept strings which may look like a URL and, second, to suggest URL’s from the history just like noun_type_url, but only based on URL matches and not title matches.4 Here are a few sample inputs:

input suggestion tuner-top.png
moz http://www.mozilla.com/   0.9
http://moz   0.5
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial   0.9
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official   0.9
http://en-us.www.mozilla.com/en-US/firefox/about/   0.9
test http://test   0.5
http:// http://   0.5
http: http:   0.5
http http   0.5
_test http://_test   0.5
hello world! http://hello world!   0.5

Oh, where to begin!? Here are some initial quirks… it’s possible that you could think of more!

  1. There is no differential scoring… only 0.9 for suggestions from history and 0.5 for URL-like strings.
  2. A number of invalid domain names are being accepted and turned into suggestions (“hello world!”, “_test”, etc.).
  3. It’s trying to be smart by suggesting “http://” as a default [[URI scheme]] but doing so even for prefixes (initial substrings) of the word “http” itself.

With these thoughts in mind, I just took a first stab at improving this situation. Here are some features of the new implementation:

  1. History entries are scored in the same way as in noun_type_awesomebar, using matchScore.
  2. URLs without an explicit [[URI scheme]] (like “http://”) get a 10% penalty.
  3. “http://” is only suggested if one of a long list of common URI schemes are not detected.
  4. It repairs schemes which are missing a slash or two, suggesting for example “http:hello.com” → “http://hello.com”.
  5. It actually uses Firefox’s own IDNService to check if the domain name is a valid [[internationalized domain name]]. If it’s an IDN as opposed to LDH (“letters, digits, and hyphens”), it gets a 10% penalty. If it’s not even a valid IDN, it is ruled out (see last two example inputs below).
  6. There are also penalties for only being a domain name with no path and for the domain not having any periods (.) in it.

Here is what our suggestions now look like:

input suggestion tuner-top.png
moz http://www.mozilla.com/   0.6
http://moz   0.65
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial   0.63
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official   0.61
http://en-us.www.mozilla.com/en-US/firefox/about/   0.6
test http://test   0.65
http:// http://   1
shttp://   0.75
http: http://   0.9
shttp://   0.7
http http://   0.72
https://   0.71
shttp://   0.68
http://http   0.65
_test none  
hello world! none  

See you tomorrow~

Alright, enough nountype wrangling for one day. I’ll be back again tomorrow for another installment.

  1. If we could make assumptions about what tags look like, for example that they are always pretty short, or use certain character classes, we could use such factors as well to judge non-preexisting tags for “tagginess” but unfortunately it’s possible (though unlikely) that a user would prefer really long tag strings and of course Firefox allows tags in any unicode code range. The only strings we can immediately rule out as impossible are ones which are purely whitespace. 

  2. It’s actually unclear whether the method we’re using (nsIAutoCompleteSearch) is actually searching titles or not… it currently looks like it’s only looking at the URL’s. Perhaps the title query is what we’re supposed to enter in the mystery parameter

  3. I hope to discuss the matchScore function in a separate blog post later. 

  4. While writing up this section I ran into a bug whereby when both noun_type_awesomebar and noun_type_url are active, only one of their async callbacks from Utils.history.search are returned. Thus, if lucky, only one of the nountypes will return the history results and if unlucky the parse query will not complete. Filed as trac #845