Hacker News new | ask | show | jobs
by knowledgesale 4817 days ago
A basic spellchecker is extremely simple to implement:

http://norvig.com/spell-correct.html

An instance with a similar code could be run, say, at Heroku (it would even be for free for the low traffic) with, say, a Flask wrapper:

http://flask-restful.readthedocs.org/en/latest/api.html

If the output format is reproduced, all it would take is changing the reference url.

4 comments

>A basic spellchecker is extremely simple to implement

That reminds me of this article: http://prog21.dadgum.com/29.html "A Spellchecker Used to Be a Major Feat of Software Engineering"

Every time I read a norvig article I can feel myself get a little bit cleverer

Edit: And dumber, until I work through the examples twice

NB Chrome's spell checker has an edit distance of one. I learnt something useful and immediately applicable from norvig - I tell you this guy is great :-)

It is simple in case of English. For highly inflectional languages [1] it is quite challenging.

[1] http://en.wikipedia.org/wiki/Inflection

You have not even glanced at the referred article, have you? It is not conceptually harder, pretty much the same thing.

I played with some similar tasks for my native Russian. You can just add up a layer of hash tables/dictionaries to link to the original word. There are inflections in English too (and it is even harder for spelling purposes as the difference is often by one character) so it is conceptually similar.

On the other hand, compounding languages (http://en.wikipedia.org/wiki/Compound_(linguistics)) such as Dutch and German are somewhat harder. Words of over 20 characters are fairly common in Dutch and German (hence, it is not surprising that Wikipedia's lemma on word length is only available in those languages (http://nl.wikipedia.org/wiki/Woordlengte))

If you meet a word that looks no way like anything you have seen before you must try and check whether you can construct it (or something similar to it) from constituent parts that each are in your dictionary. That can be difficult, because you must also know how compound words can be constructed. For example, a preposition can be a part, but (typically?) not at the end of a word, you (typically?) have at most one verb, adjectives (typically?) show up at the start only, etc. (the 'typically?' disclaimers show my ignorance)

A decent spelling checker needs to handle this, as it should signal missing spaces between words. For example, "blackeyed", "eyewhite", "blackkeyed", "grayblack" and "blackgray" are acceptable, but "eyedblack" is not.

I found the spell-checking in VIM to be quite awesome as well. I'm not sure the algo behind it, but it seems to give good results even when I totally butcher a word.
IIRC vim has been using/able to use the openoffice spell stuff, in turn based on hunspell which has extremely wide usage (OOo, OSX, Chrome, Opera, Eclipse, Mozilla etc)
Uh, not based on, OOo and LO uses hunspell.
Good idea if someone has a large amount of WP installs (for clients, etc)..