|
|
|
|
|
by JieJie
636 days ago
|
|
I’m not sure if this is exactly what you are referring to, but Anthropic has done a lot of interpretability work on Claude, which they’ve published along with the famous "Golden Gate Claude".^1 "We also find more abstract features—responding to things like bugs in computer code, discussions of gender bias in professions, and conversations about keeping secrets." 1: https://www.anthropic.com/research/mapping-mind-language-mod... |
|