| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cjbprime 849 days ago

An analogy that works without having to explain anything at all about how LLMs actually work (or maybe does explain a lot, depending on how you look at it) could be:

* LLMs are lossy compression functions on their training data.

* The size of the model dictates how lossy the compression is.

* You can't spend compute to get more detail out of a model once it's been compressed/trained, anymore than you can spend compute to get an incredibly lossily-compressed movie to go from 240p back to the original 1080p source.

2 comments

astrange 848 days ago

You obviously can do that though; diffusion models produce better (fsvo better) images the more steps you run of them.

Similarly, LLMs can produce better answers if you teach them thinking strategies that remind them to put the available evidence and intermediate steps in their context window. Otherwise they'll tend to hallucinate an answer out of vaguely correct words.

link

profile53 848 days ago

Diffusion models are a different architecture, namely, a recursive or iterative one. Transformer models are not recursive or iterative.

link

astrange 848 days ago

Sure they are. It only natively outputs one token; the recursive process is how you get the rest out of them.

link

profile53 848 days ago

You’re totally right … should’ve thought that one through more.

link

frannyg 848 days ago

> You can't spend compute to get more detail [...]

Upscaling, technically, is a thing without limits, no?

link