|
This matches my experience, especially "don’t draw the owl" and the harness-engineering idea. The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX). What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me. Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed. |
Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.
So I would start getting to 90% and basically starting a new project with that as the baseline to add to.