Hacker News new | ask | show | jobs
by adamsvystun 982 days ago
Isn't the point of transformer training for it to learn to imitate the distribution of the training data? While concepts of "imitating the distribution" and "copying verbatim" are different, they are not too far off each other either.