Hacker News new | ask | show | jobs
by energy123 348 days ago
Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion?

https://www.anthropic.com/news/tracing-thoughts-language-mod...

1 comments

The same work in which they show that the LLM doesn’t know what it "thinks"? or how it arrives at its conclusions where they demonstrate that it outputs what is statistically most probable? even though the logits indicate it was something else.