| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 317070 111 days ago
	As an expert in the field: this is exactly right. LLMs are trained to do whole book prediction, at training time we throw in whole books at the time. It's only when sampling we do one or a few tokens at the time.

2 comments

justinator 111 days ago

where do you get these books?

honking intensifies

WHERE DO YOU GET THESE BOOKS?!

link

tasuki 111 days ago

The local library.

link

benterix 111 days ago

We do things, but it doesn't feel right

link

fc417fc802 111 days ago

Can anyone even say what a book really is at the end of the day? It's such an abstract concept. /s

link

TuringTest 111 days ago

Isn't that the same as compressing the whole book, in a special differential format that compares how the text looks from any given point before and after?

link

317070 111 days ago

There are many ways to model how the model works in simpler terms. Next-word prediction is useful to characterize how you do inference with the model. Maximizing mutual information, compressing, gradient descent, ... are all useful characterisations of the training process.

But as stated above, next token prediction is a misleading frame for the training process. While the sampling is indeed happening 1 token at a time, due to the training process, much more is going on in the latent space where the model has its internal stream of information.

link

margalabargala 110 days ago

Everything is the same as everything else. It's all just hydrogen and time mixed together.

link