Y
Hacker News
new
|
ask
|
show
|
jobs
by
frotaur
874 days ago
In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.