| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by energy123 396 days ago
	Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion? https://www.anthropic.com/news/tracing-thoughts-language-mod...

1 comments

lossolo 395 days ago

The same work in which they show that the LLM doesn’t know what it "thinks"? or how it arrives at its conclusions where they demonstrate that it outputs what is statistically most probable? even though the logits indicate it was something else.

link