| HN Mirror

Some form of internal representation is crucial. Translation is a n^2 problem where some nodes like Chinese, English and Spanish have much thicker arrows, which makes traditional approaches awful for less common languages-pairs.

Aside from the lack of training data in many languages, I get the impression that tech companies like Google have been anglocentric in their approach, resulting in ok results only if at least one of the languages are “big”. That’s one thing that’s amazing about ChatGPT, it doesn’t discriminate between languages much, or, at least it seems like it’s able to transfer knowledge really well between languages. It seems it finds the higher level patterns of human knowledge to the point where language or even style is basically just a frontend.

Ironically, it seems the less you bother to teach computers about linguistics, the better they perform at language.