|
|
|
|
|
by xico
974 days ago
|
|
That’s indeed its interface, but there is still a huge training phase during which the model was taught using mostly English, and it explains why some other languages interactions feel like they are some direct translation of English expressions (not to mention higher ways of thinking). Most of the thinking indeed is done during training, and then encoded in the weights which act like a huge cache of thoughts, but there is still a lot of learning on how to use the context when you emit this character (token), the attention layers, which is also a form of thinking that uses whatever new input you are giving to the model and applies a series of transformations (granted, finite in current models) to it. |
|
The bigger the dataset, the best is it's capabilities, training it exclusively with a language don't mean it will be better at this language.