Hacker News new | ask | show | jobs
by karanlyons 2814 days ago
You’re going to want to look at https://en.wikipedia.org/wiki/Kneser–Ney_smoothing for further improvements on an ngram based approach.
2 comments

Or even just Stupid Backoff[1]: If you can't find anything of length N, try N-1 but lower the probability by a fixed ratio (0.4).

[1] Page 2, equation 5 of https://www.aclweb.org/anthology/D07-1090.pdf#page=2

I will. Thanks for the suggestion.