|
|
|
|
|
by warwickmcintosh
78 days ago
|
|
The sanitised optimism problem mentioned upthread is the real gap here. Event stream logging tells you what tools were called and in what order, but it doesn't tell you whether the agent's self-reported outcome matches reality. |
|