Y
Hacker News
new
|
ask
|
show
|
jobs
by
solarwindy
353 days ago
The relevant research field is known as mechanistic interpretability. See:
https://arxiv.org/abs/2404.14082
https://www.anthropic.com/research/mapping-mind-language-mod...