|
|
|
|
|
by vessenes
52 days ago
|
|
It’s a particular sort of bug that’s harder to detect because … internal Anthropic engineers don’t apply these prompts to themselves, and in fact have access to ‘helpful only’ models that also do not have additional limitations RL’ed in. (Or perhaps they’re RL’ed out - not sure of current training mechanisms.) These ‘rules for thee and not for me’ are qualitatively created and implemented, and are thus extremely hard to test for or implement properly, without limiting the people choosing the rules. |
|
....Right?
What kind of Mickey mouse operation are they running over there?