Y
Hacker News
new
|
ask
|
show
|
jobs
Arrows of Time for Large Language Models
(
arxiv.org
)
6 points
by
tianlong
865 days ago
2 comments
nyoncore
865 days ago
Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?
link
frotaur
865 days ago
In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.
link
tianlong
865 days ago
There is a link with entropy creation?
link