Hacker News new | ask | show | jobs
by solarwindy 353 days ago
The relevant research field is known as mechanistic interpretability. See:

https://arxiv.org/abs/2404.14082

https://www.anthropic.com/research/mapping-mind-language-mod...