|
|
|
|
|
by kian
16 days ago
|
|
"In my experience, AI still drifts from what I meant it to do on anything bigger than building a widget." I've had code bases with tens of thousands of lines of code built from scratch that I hand-reviewed every line of and worked with the AI to improve, and haven't had this issue. I feel like a significant part of this is due to an involved /plan stage -- going back and forth on building out a plan for what you want the AI to do involves surfacing the assumptions that you would have called drift if you asked them to implement it directly from your prompt. Once the plan has been refined and is what I want it to be, getting it to implement everything in TDD style has for the most part given me 100% working code, as I wanted it to be, without issues. It definitely helps that I'm a principal-level engineer with extensive architectural experience -- but if you're able to tell the AI in detail what you want, have it ask questions for clarifications, and read through a plan before getting it implemented, and have a solid testing plus manual qa process (automated by chrome devtools mcp) in place, I've find that you can one-shot complex features, rewrites, and even not-insignificant applications that would have taken days to write by hand in a few hours. |
|
Still Claude will sneak things in - in my recent plan, for example I had defined, per acceptance criteria what colours the statuses should be: green for live, blue for sold, grey for anything else; it changed this to: green for live, orange for in progress, blue for sold, red in demolition, etc. When pressed why did it to this, it was unable to explain why. This is with a plan where AC were explicitly provided from the task in Given/When/Then format and were to be adhered to strictly. I've caught this within planning, but I shouldn't need to be doing this.
Even in standard prompts where I tell it "Change this label from X to Y", it ended reordering the tabs unrelated to ask. Again I was not able for it to explain why - it was so abrupt. And it was in fresh context, without any pollution on what I expect it to do.
I also noticed a different behaviour regarding skill; today and yesterday it would not be following skill guidance at all ie: skill writing skill - I'd have to explicitly tell it to test skills after writing them, when this is a behaviour expected by default. Similarly with other skills - knowing that it should have done something per skill guidelines and it not doing it at all. This is new behaviour that I've not seen a week ago.