Using Google Keyboard when typing in language different than English can be really frustrating. Neuxs 5X default keyboard keeps changing correctly spelled Polish sentences into utter nonsense. It is a lot better for English but the spell checker still doesn’t recognize many correctly spelled words.
To get some hard data on the subject I compared a Google Keyboard dictionary with Hunspell, a popular spellchecking library used in Chrome, Firefox, Mac OS X, Ubuntu and many others. Depending on language Google Keyboard is able to recognize between 4% and 77% of words known to Hunspell.
Language | Hunspell words | Recognized | Unrecognized | % Recognized |
---|---|---|---|---|
en_US | 152582 | 7747 | 2254 | 77% |
es_ES | 885418 | 2480 | 7521 | 25% |
pl_PL | 3624473 | 511 | 9490 | 5% |
ru_RU | 4201083 | 361 | 9640 | 4% |
You can find a test rig on GitHub. It generates a list of all words in Hunspell dictionary and checks if Google Keyboard can recognize randomly selected words. It uses the Android Spell Checker API as it’s easier to automatically test than the autocomplete and apparently they use the same dictionary.
Diagnose
Google Keyboard uses fixed list of around 200k words and it is not able to recognize many derived words created by appending suffixes (e.g. ~’s for English nouns in possessive form). Other tested languages have a much larger number of suffixes and prefixes because of conjugation and declension. This explains why the percentage of recognized words in these languages is dramatically smaller.
Solution
The dictionaries used by Android keyboard apps should be generated from the complete lists of correct words. Available spellchecking dictionaries can be used to generate such lists. The frequencies of different words (used to order autocorrect suggestions) should be applied to the the list taking into account relative frequency of different derived forms. The frequencies should reflect the nature of texts usually typed on mobile. Currently the words “study” and “therefore” are marked as more popular than “hi” or “guys”.