Hacker News new | ask | show | jobs
by gpm 702 days ago
Language diversity means access to more training data, and you might also hope that by learning the same concept in multiple languages it does a better job of learning the underlying concept independent of the phrase structure...

At least from a distance it seems like training a multilingual state of the art model might well be easier than a monolingual one.

1 comments

Multiple input and output processes in different languages has zero effect on associative learning and creative formulation in my estimations. We've already done studies that show there is no correlation between human intelligence and knowing multiple languages, after having to put up with decades of "Americans le dumb because..." and this is no different. The amount of discourse on a single topic has a limited degree of usability before redundancies appear. Such redundancies would necessarily increase the processing burden, which could actually limit the output potential for novel associations.
Google mentioned this in one of their papers, they found for large enough models including more languages did indeed lead to an overall increase in performance.
Considering Googles progress and censorship history, I'm inclined to take their assessments with a grain of salt.
Humans also don't learn by reading the entire internet... assuming human psych studies apply to LLMs at all is just wrong.