Hacker News new | ask | show | jobs
by Someone 4816 days ago
On the other hand, compounding languages (http://en.wikipedia.org/wiki/Compound_(linguistics)) such as Dutch and German are somewhat harder. Words of over 20 characters are fairly common in Dutch and German (hence, it is not surprising that Wikipedia's lemma on word length is only available in those languages (http://nl.wikipedia.org/wiki/Woordlengte))

If you meet a word that looks no way like anything you have seen before you must try and check whether you can construct it (or something similar to it) from constituent parts that each are in your dictionary. That can be difficult, because you must also know how compound words can be constructed. For example, a preposition can be a part, but (typically?) not at the end of a word, you (typically?) have at most one verb, adjectives (typically?) show up at the start only, etc. (the 'typically?' disclaimers show my ignorance)

A decent spelling checker needs to handle this, as it should signal missing spaces between words. For example, "blackeyed", "eyewhite", "blackkeyed", "grayblack" and "blackgray" are acceptable, but "eyedblack" is not.