Hacker News new | ask | show | jobs
by fc417fc802 110 days ago
> but do not have even a theory about how the behavior emerges from among the math

Actually we have an awful lot of those.

I'm not sure if emergent is quite the right term here. We carefully craft a scenario to produce a usable gradient for a black box optimizer. We fully expect nontrivial predictions of future state to result in increasingly rich world models out of necessity.

It gets back to the age old observation about any sufficiently accurate model being of equal complexity as the system it models. "Predict the next word" is but a single example of the general principle at play.

1 comments

> black box optimizer

This is admission we don't know how it emerges.

Sure, we expect the behavior to emerge, but we don't know how.

No, as I said, we have _lots_ of theories about exactly that at various levels of detail. The theories vary based on (at least) the specifics of the loss function being employed to construct the gradient. Giving an overview of that is far beyond the scope of this comment section (but it's well trodden ground so you can just go ask an LLM).

The "black box" bit refers to a generic, interchangeable optimization algorithm that simply makes the number go down (or up or whatever).

There are certainly various details about the internal workings of models that we don't properly understand but a blanket claim about the whole is erroneous.