Hacker News new | ask | show | jobs
by appplication 759 days ago
We do actually understand generally well enough what is happening. Attention isn’t some mysterious unexplained mechanism. We know how it works and why. When people describe these models as a black box, they typically mean that there are too many layers and weights to explain to you exactly why it chose, for example, a specific sequence of words. But we can certainly explain exactly why it would chose some sequence, and why that sequence would be expected to be relevant.

Simplifying a bit, but attention provides a way for the model to build context on one word based on how often it is seen with others. It doesn’t have a concept of correct or incorrect. It doesn’t have a concept of reasoning.

What is impressive is that even without these concepts of correctness and reasoning, the model can still perform quite well on tasks where correctness and reasoning would be expected. But this is more a statement on the corpus of knowledge and the power of language in general than it is on the models capabilities itself. It’s important not to confuse the ability to seem correct and seem well reasoned with any actual mechanism to do so.

1 comments

> We do actually understand generally well enough what is happening.

See the comment on the "Golden Gate Bridge" version of Claude:

"The fact that we can find and alter these features within Claude makes us more confident that we’re beginning to understand how large language models really work." (emphasis mine)

https://www.anthropic.com/news/golden-gate-claude