|
|
|
|
|
by adamsvystun
982 days ago
|
|
Isn't the point of transformer training for it to learn to imitate the distribution of the training data? While concepts of "imitating the distribution" and "copying verbatim" are different, they are not too far off each other either. |
|