| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wooorm 4276 days ago
	Ha! Some very nice examples, I have to say :) Anyway, You’re completely right. Italian is `und` due to LTE 10 characters, the others are slightly off due to short input too, but the demo (http://wooorm.github.io/franc/) shows the correct languages in the second or third place though!

1 comments

jodent 4276 days ago

No it doesn't, still takes French for Catalan (French only comes at third place, after Italian), and Swedish for Dutch. (Arguably those are close languages, but hey, this is why I'm using this, right?)

link

wooorm 4276 days ago

By `correct language` I mean the language you expect, by `second` and `third` I mean `2.` and `3.` in the previously mentioned demo: http://wooorm.github.io/franc/). I think we’re talking about the same thing!

Anyway, yeah, franc is for language detecting, but it’s optimised for many languages and works best at longer text. It’s a trade-off. For less languages and shorter texts, check out https://github.com/shuyo/ldig

link