How bad is Android Google Keyboard

Using Google Keyboard when typing in language different than English can be really frustrating. Neuxs 5X default keyboard keeps changing correctly spelled Polish sentences into utter nonsense. It is a lot better for English but the spell checker still doesn’t recognize many correctly spelled words.

To get some hard data on the subject I compared a Google Keyboard dictionary with Hunspell, a popular spellchecking library used in Chrome, Firefox, Mac OS X, Ubuntu and many others. Depending on language Google Keyboard is able to recognize between 4% and 77% of words known to Hunspell.

Language Hunspell words Recognized Unrecognized % Recognized
en_US 152582 7747 2254 77%
es_ES 885418 2480 7521 25%
pl_PL 3624473 511 9490 5%
ru_RU 4201083 361 9640 4%

You can find a test rig on GitHub. It generates a list of all words in Hunspell dictionary and checks if Google Keyboard can recognize randomly selected words. It uses the Android Spell Checker API as it’s easier to automatically test than the autocomplete and apparently they use the same dictionary.

Diagnose

Google Keyboard uses fixed list of around 200k words and it is not able to recognize many derived words created by appending suffixes (e.g. ~’s for English nouns in possessive form). Other tested languages have a much larger number of suffixes and prefixes because of conjugation and declension. This explains why the percentage of recognized words in these languages is dramatically smaller.

Solution

The dictionaries used by Android keyboard apps should be generated from the complete lists of correct words. Available spellchecking dictionaries can be used to generate such lists. The frequencies of different words (used to order autocorrect suggestions) should be applied to the the list taking into account relative frequency of different derived forms. The frequencies should reflect the nature of texts usually typed on mobile. Currently the words “study” and “therefore” are marked as more popular than “hi” or “guys”.

Resurrecting the blog

It’s been five years since I last posted here.
Not that I haven’t got anything interesting to say, I was just too occupied with my day-to-day work and, well… life. Working in confidentially-obsessed banking industry didn’t help much with sharing my work related thoughts either.

My focus changed initially from Flex to HTML5/JavaScript and then to Android.

Just like with Flex, with Android I couldn’t resist peeking under the hood and figuring out how things work. And, as before, I ended up contributing to the project and relying on my in-depth understanding of the system in my day job.

Some of my findings regarding AOSP (Android Open Source Project) seem worth sharing so I decided to resurrect this blog.

String Concatenation

I am working on exporting DataGrid content to CSV (I will present result soon) and one question came to my mind:

Is ActionScript 3 String concatenation efficient?

After some googling I have found Peter Farland blog post which suggested using String.fromCharCode() method instead of string concatenation. Although my case was a little bit different because I was to concatenate whole item labels (not single characters) I decided to create simple StringBuffer based on fromCharCode().

Continue reading String Concatenation