Hacker News new | ask | show | jobs
by redox_ 4592 days ago
Store only the top N common non-ambiguous words if the RAM consumption matters ;)
1 comments

Or store the lexicon in a determinisitic acyclic finite state automaton. E.g. (shameless plug):

https://github.com/danieldk/dictomaton

Though, having implemented a language guesser myself, it's only an issue with very short texts (a few words). On longer texts models based on character n-grams achieve very high accuracies.