Hacker News new | ask | show | jobs
by wooorm 4276 days ago
Ha! Some very nice examples, I have to say :)

Anyway, You’re completely right. Italian is `und` due to LTE 10 characters, the others are slightly off due to short input too, but the demo (http://wooorm.github.io/franc/) shows the correct languages in the second or third place though!

1 comments

No it doesn't, still takes French for Catalan (French only comes at third place, after Italian), and Swedish for Dutch. (Arguably those are close languages, but hey, this is why I'm using this, right?)
By `correct language` I mean the language you expect, by `second` and `third` I mean `2.` and `3.` in the previously mentioned demo: http://wooorm.github.io/franc/). I think we’re talking about the same thing!

Anyway, yeah, franc is for language detecting, but it’s optimised for many languages and works best at longer text. It’s a trade-off. For less languages and shorter texts, check out https://github.com/shuyo/ldig