| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colah3 272 days ago
	(Disclaimer: I work on interpretability at Anthropic.) I wanted to flag that this is an accessible blog post and that there's a link to the paper ( https://transformer-circuits.pub/2025/introspection/index.ht... ) at the top. The paper explores this in more detail and rigor.