Hacker News new | ask | show | jobs
by grimborg 4272 days ago
Interesting!

Sometimes it gets it almost right: I tried with this piece of text in Catalan (Balear variant) and it classifies it as Portuguese (with Catalan as 2nd option): "I s'horabaixa la deixam passar i me mires tan a prop que me fa mal, que surt es sol i encara plou, que t'estim massa i massa poc, que no sé com ho hem d'arreglar, que som amics, que som amants."

It's strange, because it's pretty different from Portuguese...

The Catalan poem "tirallonga de monosíl·labs" gets classified as French. (http://www.rodamots.com/calaix.asp?text=tirallonga)

1 comments

It sucks, right? Currently, it’s good at long passages. But for shorter values, the results are pretty poor. The amount of supported languages is just too damn high!