Hacker News new | ask | show | jobs
by minimaltom 35 days ago
Between this, the emotions paper, golden gate claude etc, it doesn't seem like such a stretch that Anthropic are doing some kind of activation steering as part of training (and its part of their lead)
1 comments

it could be helpful in gettig their learnings to generalize from RL