Y
Hacker News
new
|
ask
|
show
|
jobs
by
energy123
348 days ago
Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion?
https://www.anthropic.com/news/tracing-thoughts-language-mod...
1 comments
lossolo
347 days ago
The same work in which they show that the LLM doesn’t know what it "thinks"? or how it arrives at its conclusions where they demonstrate that it outputs what is statistically most probable? even though the logits indicate it was something else.
link