Arrows of Time for Large Language Models

Y	Hacker News new \| ask \| show \| jobs

	Arrows of Time for Large Language Models (arxiv.org)
	6 points by tianlong 865 days ago

2 comments

nyoncore 865 days ago

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

link

frotaur 865 days ago

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

link

tianlong 865 days ago

There is a link with entropy creation?

link