Hacker News new | ask | show | jobs
by Animats 2 days ago
That's real progress. The paper behind it is [1] They try to extract "attribution graphs" to understand why the LLM produced some result. It's encouraging to see more work on what's going on inside. They obtained insight into a specific kind of hallucination: not finding a specific fact. "We uncover circuit mechanisms that allow the model to distinguish between familiar and unfamiliar entities, which determine whether it elects to answer a factual question or profess ignorance. “Misfires” of this circuit can cause hallucinations." That should be tested on queries which resulted in making up legal citations.

[1] https://transformer-circuits.pub/2025/attribution-graphs/met...