| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by frotaur 874 days ago
	In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.