Hacker News new | ask | show | jobs
by akie 1590 days ago
We're using libraries like this to try to guess the language of a book based on title alone (in case no other information is readily available), and trigram-based algorithms get it wrong often enough for it to be noticeable. I will look into replacing our current library with this one, it seems better suited for the task at hand.
1 comments

Yeah, language detection on short texts is quite complex. In my practice, N-grams doesn’t work well for them.