Hacker News new | ask | show | jobs
by breuderink 4278 days ago
I don't know, I think I used about 40 languages. The beauty is that zip-compression captures rich statistical properties of the languages, so representation-wise it should come a long way. But counting compressed output length discretises the lang-lang distance. For shorter text this might be troubling, since this could easily result in ties. So, maybe. Perhaps I should try :).
1 comments

Perhaps you should ;) If, I’d be interest to know how it goes!