|
|
|
|
|
by IanCal
931 days ago
|
|
I've wondered for years how far you could get just checking perplexity. English -> internal rep, and x-> internal rep. Then mapping between the internal reps such that English -> another language has low perplexity. That is, a sensible sentence in English should result in a sensible sentence in another language. |
|
Aside from the lack of training data in many languages, I get the impression that tech companies like Google have been anglocentric in their approach, resulting in ok results only if at least one of the languages are “big”. That’s one thing that’s amazing about ChatGPT, it doesn’t discriminate between languages much, or, at least it seems like it’s able to transfer knowledge really well between languages. It seems it finds the higher level patterns of human knowledge to the point where language or even style is basically just a frontend.
Ironically, it seems the less you bother to teach computers about linguistics, the better they perform at language.