Hacker News new | ask | show | jobs
by godelski 1482 days ago
This is a great response (I also suspected we'd learn something from the Google Translate black box). And I agree with the idea of being closer to Latin gibberish. The phonetic relationships are a great hint to what's actually going on.

My hypothesis here is more that these models are trained more on western languages than others and thus our latent representation of "language" is going to appear like Latin gibberish due to a combination of the evolution of these languages as well as human bias. ("It's all Greek to me")