Hacker News new | ask | show | jobs
by knowledgesale 4818 days ago
You have not even glanced at the referred article, have you? It is not conceptually harder, pretty much the same thing.

I played with some similar tasks for my native Russian. You can just add up a layer of hash tables/dictionaries to link to the original word. There are inflections in English too (and it is even harder for spelling purposes as the difference is often by one character) so it is conceptually similar.

1 comments

On the other hand, compounding languages (http://en.wikipedia.org/wiki/Compound_(linguistics)) such as Dutch and German are somewhat harder. Words of over 20 characters are fairly common in Dutch and German (hence, it is not surprising that Wikipedia's lemma on word length is only available in those languages (http://nl.wikipedia.org/wiki/Woordlengte))

If you meet a word that looks no way like anything you have seen before you must try and check whether you can construct it (or something similar to it) from constituent parts that each are in your dictionary. That can be difficult, because you must also know how compound words can be constructed. For example, a preposition can be a part, but (typically?) not at the end of a word, you (typically?) have at most one verb, adjectives (typically?) show up at the start only, etc. (the 'typically?' disclaimers show my ignorance)

A decent spelling checker needs to handle this, as it should signal missing spaces between words. For example, "blackeyed", "eyewhite", "blackkeyed", "grayblack" and "blackgray" are acceptable, but "eyedblack" is not.