|
|
|
|
|
by tbalsam
1175 days ago
|
|
The fuzzy jpeg analogy and related kin ignore the internal disentanglement of ideas, which is what separates LLMs from, say, a probabilistic chain producer. I.e. one can think of it as a NERF of an underlying manifold instead of just assembling pictures taken of the manifold, which is an important distinction to make. I.e. it learns the manifold, not the manifold samples. That's what makes it so powerful and lets it coherently mix and match very abstract concepts together. Even if it gets it wrong, one could link that to the fuzziness of a NERF where there is not as much data. That's why this whole "average" business is silly nonsense. We're reducing the empirical risk over the dataset, not the L2 loss over it for Pete's sake. |
|
"John Smith"
Where is it from?
"New York City"
I haven't tried to turn up the temperature but I assume that's needed to convince it to give unexpected responses.
There is definitely an element of averaging going on in these models and it's worth maintaining awareness of it. This is IMO also the cause of ChatGPTs odd disembodied voice. It's always projecting some modes in the data.