| HN Mirror

Hello, I am the author - this is not an LLM-generated article, I wrote this by hand and had an LLM adapt it from a thread on X. You can see the original thread here: https://x.com/mathemagic1an/status/2035850046735098065

> the fact that language models have human-interpretable representations and neurons has been known since BERT... Circuits research also does not come from Anthropic... The article does not claim Anthropic invented the field, rather that they have had important contributions to it. This is intended as an overview into a specific set of ideas that are working for mechanistic interpretability. Not a formal literature review.