Nountype Quirks: Day 1
Today I began the process of going through all of the nountypes built-in to Ubiquity using the principles and criteria I laid out yesterday—a task I’ve had in planning for a while now. As I explained yesterday, improved suggestions and scoring from the built-in nountypes could directly translate to better and smarter suggestions, resulting in a better experience for all users. Here I’ll document some of the nountype quirks I’ve discovered so far and what remedy has been implemented or is planned.
Note: this blog post includes a number of graphs using HTML/CSS formatting. If you are reading this article through a feed reader or planet, I invite you to read it on my site.
noun_type_percentage
Here’s what a few different inputs originally returned:
input | suggestion | |
---|---|---|
20 | 20% | 1 |
20% | 20% | 1 |
0.2 | 20% | 1 |
0.2% | 20% | 1 |
20.0 | 2000% | 1 |
2 hens in the garden | 2% | 1 |
Let me highlight a couple obvious quirks:
- In certain cases, where the numerical expression includes a decimal and is less than one, it is interpreted as a proportional, rather than percent, value, e.g. “0.2” → “20%”. “0.2%” is not even an option. This is the case even when explicitly adding a % sign.
- All suggestions, including those where the numeral was extracted from a long string of text (e.g. “2 hens in the garden”), get the same score of 1.
I just committed a fix so noun_type_percentage
now…
- Counts the number of characters in the input which match
[\d.%]
and caps the score by (number of acceptable characters)/(length of input). - Strings which do not include “%” get a 10% penalty.
- In the case of decimals less than 1 without a % sign, the proportion interpretation is also suggested (e.g. “0.2” → “20%”) in addition to the original suggestion (“0.2%”), but with a slight penalty.
Here is what they now return:
input | suggestion | |
---|---|---|
20 | 20% | 0.9 |
20% | 20% | 1 |
0.2 | 0.2% | 0.9 |
20% | 0.81 | |
0.2% | 0.2% | 1 |
20.0 | 20% | 0.9 |
2 hens in the garden | 2% | 0.05 |
noun_type_tag
Here’s what a few different inputs originally returned. Keep in mind that currently in this test profile, the preexisting tags are “animal”, “help”, “test”, and “ubiquity”.
input | suggestion | |
---|---|---|
animal | animal | 0.3 |
mineral | mineral | 0.3 |
anim | animal | 0.7 |
anim | 0.3 | |
help, test, ubiq | help,test,ubiquity | 0.7 |
help,test,ubiq | 0.3 | |
google, yahoo, ubiq | google,yahoo,ubiquity | 0.7 |
google,yahoo,ubiq | 0.3 | |
google, , yahoo | google,yahoo | 0.3 |
Here are a few of noun_type_tag
’s quirks:
- There are only two scores ever given out: 0.3 and 0.7.
- Only the last tag in the list and whether it exists or not is taken into account.
- When the last tag is incomplete, the completion is suggested with a higher score, but if the last tag is exactly equal to an existing tag, it gets the lower score.
Ideally, we want noun_type_tag
to look at each of the tags given to it, with higher scores for when there are more preexisting tags and fewer new ones. Keep in mind, though, that we only have to suggest the completion of the very last tag as that may be one where the user hasn’t completed typing yet… for earlier tags, we can assume (safely or not) that the user placed the comma where they meant to. We can’t teach Ubiquity to read minds, after all.1
With this in mind, I just made a change to noun_type_tag which aims to follow these principles. The basic idea is that we start with a base score of 0.3 but then raise it via [[nth root |
nth root]] for every tag in the sequence which is preexisting. Here’s what the same inputs return now. Recall that the preexisting tags are “animal”, “help”, “test”, and “ubiquity”. |
input | suggestion | |
---|---|---|
animal | animal | 0.55 |
mineral | mineral | 0.3 |
anim | animal | 0.55 |
anim | 0.3 | |
help, test, ubiq | help,test,ubiquity | 0.86 |
help,test,ubiq | 0.74 | |
google, yahoo, ubiq | google,yahoo,ubiquity | 0.55 |
google,yahoo,ubiq | 0.3 | |
google, , yahoo | google,yahoo | 0.3 |
noun_type_awesomebar
input | suggestion | |
---|---|---|
moz | http://www.mozilla.com/ | 0.8 |
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial | 0.8 | |
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official | 0.8 | |
http://en-us.www.mozilla.com/en-US/firefox/about/ | 0.8 |
There are a couple quirks here:
- All suggestions are returned with the same scores.
- The nountype returns the URL of the entry as the HTML-formatted result and the title as the text-formatted result, which clearly does not make sense. However, it’s not clear to me whether the title, URL, or some combination of both is what we should be returning as the suggestion text presented to the user.2
I just rewrote noun_type_awesomebar
to actually do some differential scoring. This new version also presents the URL or title depending on whichever had a better match using the matchScore
function.3
input | suggestion | |
---|---|---|
moz | www.mozilla.com | 0.7 |
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial | 0.63 | |
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official | 0.61 | |
http://en-us.www.mozilla.com/en-US/firefox/about/ | 0.6 |
noun_type_url
The purpose of noun_type_url
’s suggest function is two-fold: first, to accept strings which may look like a URL and, second, to suggest URL’s from the history just like noun_type_url
, but only based on URL matches and not title matches.4 Here are a few sample inputs:
input | suggestion | |
---|---|---|
moz | http://www.mozilla.com/ | 0.9 |
http://moz | 0.5 | |
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial | 0.9 | |
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official | 0.9 | |
http://en-us.www.mozilla.com/en-US/firefox/about/ | 0.9 | |
test | http://test | 0.5 |
http:// | http:// | 0.5 |
http: | http: | 0.5 |
http | http | 0.5 |
_test | http://_test | 0.5 |
hello world! | http://hello world! | 0.5 |
Oh, where to begin!? Here are some initial quirks… it’s possible that you could think of more!
- There is no differential scoring… only 0.9 for suggestions from history and 0.5 for URL-like strings.
- A number of invalid domain names are being accepted and turned into suggestions (“hello world!”, “_test”, etc.).
- It’s trying to be smart by suggesting “http://” as a default [[URI scheme]] but doing so even for prefixes (initial substrings) of the word “http” itself.
With these thoughts in mind, I just took a first stab at improving this situation. Here are some features of the new implementation:
- History entries are scored in the same way as in
noun_type_awesomebar
, usingmatchScore
. - URLs without an explicit [[URI scheme]] (like “http://”) get a 10% penalty.
- “http://” is only suggested if one of a long list of common URI schemes are not detected.
- It repairs schemes which are missing a slash or two, suggesting for example “http:hello.com” → “http://hello.com”.
- It actually uses Firefox’s own IDNService to check if the domain name is a valid [[internationalized domain name]]. If it’s an IDN as opposed to LDH (“letters, digits, and hyphens”), it gets a 10% penalty. If it’s not even a valid IDN, it is ruled out (see last two example inputs below).
- There are also penalties for only being a domain name with no path and for the domain not having any periods (.) in it.
Here is what our suggestions now look like:
input | suggestion | |
---|---|---|
moz | http://www.mozilla.com/ | 0.6 |
http://moz | 0.65 | |
https://wiki.mozilla.org/Labs/Ubiquity/ Parser_2_API_Conversion_Tutorial | 0.63 | |
http://en-us.start3.mozilla.com/ firefox?client=firefox-a&rls= org.mozilla:en-US:official | 0.61 | |
http://en-us.www.mozilla.com/en-US/firefox/about/ | 0.6 | |
test | http://test | 0.65 |
http:// | http:// | 1 |
shttp:// | 0.75 | |
http: | http:// | 0.9 |
shttp:// | 0.7 | |
http | http:// | 0.72 |
https:// | 0.71 | |
shttp:// | 0.68 | |
http://http | 0.65 | |
_test | none | |
hello world! | none |
See you tomorrow~
Alright, enough nountype wrangling for one day. I’ll be back again tomorrow for another installment.
-
If we could make assumptions about what tags look like, for example that they are always pretty short, or use certain character classes, we could use such factors as well to judge non-preexisting tags for “tagginess” but unfortunately it’s possible (though unlikely) that a user would prefer really long tag strings and of course Firefox allows tags in any unicode code range. The only strings we can immediately rule out as impossible are ones which are purely whitespace. ↩
-
It’s actually unclear whether the method we’re using (
nsIAutoCompleteSearch
) is actually searching titles or not… it currently looks like it’s only looking at the URL’s. Perhaps the title query is what we’re supposed to enter in the mystery parameter. ↩ -
I hope to discuss the
matchScore
function in a separate blog post later. ↩ -
While writing up this section I ran into a bug whereby when both
noun_type_awesomebar
andnoun_type_url
are active, only one of their async callbacks fromUtils.history.search
are returned. Thus, if lucky, only one of the nountypes will return the history results and if unlucky the parse query will not complete. Filed as trac #845. ↩