| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mike_hearn 562 days ago

> why don't they demonstrate that you can predict whether a trained but completely unprompted model will "know" the answer?

The answer to what? You have to ask a question to test whether the answer will be accurate, and that's the prompt. I don't understand this objection.

> If the LLM stores facts in it's weights, you should be able to demonstrate that completely at rest.

Sure, with good enough interpretability systems, and those are being worked on. Anthropic can already locate which parts of the model fire on specific topics or themes and force them on or off by manipulating the activation vectors.

> A question: Why don't LLMs produce garbage grammar when they "hallucinate"?

Early models did.