Hacker News new | ask | show | jobs
by Typhon 2030 days ago
This is a rather bad article because it completely misses the real complexity of a spell checker.

A spell checker is not simply a list of words, it's a way to check mistakes according to a standard and to point towards ways to fix these mistakes. This not reducible to a look-up in a hashtable. It requires taking into account some complicated things about the definition of a word and the context in which it is written. You might think that's grammar checking but the boundary is not clear and in any case, any language processing application starts with tokenizing and deciding what counts as words and on what basis.

What is a word even ? Is "CIA" a word ? What about "C.I.A." ? What about C (as in the language) ? What about c (as in the speed of light) ? 2,4-Dinitrophenylhydrazine ? How does the spellchecker handle dashes and apostrophes ? What about proper nouns ?

Really, the example is poorly chosen.

3 comments

The feature creep is irrelevant to the article. In 1984, a spell checker could be just a list of words and would still be a major engineering challenge.
And what about the term "spell checker" itself. A routine that checks the validity of witches' spells?

That it can't even identify that it itself is a "spelling checker" illustrates the problem.

Had you used any spellcheckers from that era?