|
|
|
|
|
by zippolyon
95 days ago
|
|
The dashcam analogy is sharp. I'd extend it: most tools record what happened (tool X was called, output was Y), but not why the agent deviated from the plan. That's the gap that actually hurts during post-mortems.
In my experience, the useful question isn't "what did the agent do?" — it's "at step T, the agent's stated intent was Z, but it executed W instead. Was that a model drift, a context window issue, or a tool failure?" Without causal structure in the log, you're left correlating timestamps and guessing.
The DataTalks/Replit incidents both had this signature: the deviation was visible in hindsight from the logs, but no system caught the intent-execution gap in real time. |
|