Hacker News new | ask | show | jobs
by hnfong 114 days ago
> black box optimizer

This is admission we don't know how it emerges.

Sure, we expect the behavior to emerge, but we don't know how.

1 comments

No, as I said, we have _lots_ of theories about exactly that at various levels of detail. The theories vary based on (at least) the specifics of the loss function being employed to construct the gradient. Giving an overview of that is far beyond the scope of this comment section (but it's well trodden ground so you can just go ask an LLM).

The "black box" bit refers to a generic, interchangeable optimization algorithm that simply makes the number go down (or up or whatever).

There are certainly various details about the internal workings of models that we don't properly understand but a blanket claim about the whole is erroneous.