|
There is something darkly comical about using an LLM to write up your “a coding agent deleted our production database” Twitter post. On another note, I consider users asking a coding agent “why did you do that” to be illustrating a misunderstanding in the users mind about how the agent works. It doesn’t decide to do something and then do it, it just outputs text. Then again, anthropic has made so many changes that make it harder to see the context and thinking steps, maybe this is an attempt at clawing back that visibility. |
Bit it can still be useful, as long as you interpret it as "which stimuli most likely triggered the behaviour?" You can't trust it uncritically, but models do sometimes pinpoint useful things about how they were prompted.