Hacker News new | ask | show | jobs
by cjbprime 849 days ago
An analogy that works without having to explain anything at all about how LLMs actually work (or maybe does explain a lot, depending on how you look at it) could be:

* LLMs are lossy compression functions on their training data.

* The size of the model dictates how lossy the compression is.

* You can't spend compute to get more detail out of a model once it's been compressed/trained, anymore than you can spend compute to get an incredibly lossily-compressed movie to go from 240p back to the original 1080p source.

2 comments

You obviously can do that though; diffusion models produce better (fsvo better) images the more steps you run of them.

Similarly, LLMs can produce better answers if you teach them thinking strategies that remind them to put the available evidence and intermediate steps in their context window. Otherwise they'll tend to hallucinate an answer out of vaguely correct words.

Diffusion models are a different architecture, namely, a recursive or iterative one. Transformer models are not recursive or iterative.
Sure they are. It only natively outputs one token; the recursive process is how you get the rest out of them.
You’re totally right … should’ve thought that one through more.
> You can't spend compute to get more detail [...]

Upscaling, technically, is a thing without limits, no?