Hacker News new | ask | show | jobs
by warwickmcintosh 78 days ago
The sanitised optimism problem mentioned upthread is the real gap here. Event stream logging tells you what tools were called and in what order, but it doesn't tell you whether the agent's self-reported outcome matches reality.