Hacker News new | ask | show | jobs
by bjourne 2270 days ago
That's an enormous topic and an enormous can of worms. Modern spell checkers all use statistical methods meaning that they are trained on a corpus. That allows them to understand that the sequence of tokens [what, i, would, like to, get into, is, :] is much more probable than [get, what i, would, like, into, is, :]. I.e the latter is grammatically incorrect.

A good start is to learn about Markov models. For more sophisticated stuff, investigate word vectors and language modeling using recurrent neural networks. The Python library NLTK comes with a free book which can teach you the basics.