|
|
|
|
|
by hipvlady
23 days ago
|
|
It wasn't a crash that cost us the most time; it was a stale read. Two agents shared a plan file. One updated the file during the run and the other continued to work off the version that had been loaded at the start. Both produced plausible output. Nothing errored. We only discovered this issue during the review process, after wasting hours of generation time. I now treat any artefact that two agents can both access as a coordination problem, not a storage one. The cheapest and most effective solution was to stamp a version on the artefact and re-check it before acting, instead of trusting the copy read at the start of the run. |
|