|
|
|
|
|
by herval
251 days ago
|
|
Fascinating write-up. I loved this bit of debugging: > The first time we played this game, Claude told me that the subagents had gotten a perfect score. After a bit of prodding, I discovered that Claude was quizzing the subagents like they were on a gameshow. This was less than useful. I asked to switch to realistic scenarios that put pressure on the agents, to better simulate what they might actually do. Also his Claude says shit a lot |
|