Hacker News new | ask | show | jobs
by frotaur 874 days ago
In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.