Hacker News new | ask | show | jobs
by colah3 225 days ago
(Disclaimer: I work on interpretability at Anthropic.)

I wanted to flag that this is an accessible blog post and that there's a link to the paper ( https://transformer-circuits.pub/2025/introspection/index.ht... ) at the top. The paper explores this in more detail and rigor.