I don't know, I think I used about 40 languages. The beauty is that zip-compression captures rich statistical properties of the languages, so representation-wise it should come a long way. But counting compressed output length discretises the lang-lang distance. For shorter text this might be troubling, since this could easily result in ties. So, maybe. Perhaps I should try :).