| As the models have progressively improved (able to handle more complex code bases, longer files, etc) I’ve started using this simple framework on repeat which seems to work pretty well at one shorting complex fixes or new features. [Research] ask the agent to explain current functionality as a way to load the right files into context. [Plan] ask the agent to brainstorm the best practices way to implement a new feature or refactor. Brainstorm seems to be a keyword that triggers a better questioning loop for the agent. Ask it to write a detailed implementation plan to an md file. [clear] completely clear the context of the agent —- better results than just compacting the conversation. [execute plan] ask the agent to review the specific plan again, sometimes it will ask additional questions which repeats the planning phase again. This loads only the plan into context and then have it implement the plan. [review & test] clear the context again and ask it to review the plan to make sure everything was implemented. This is where I add any unit or integration tests if needed. Also run test suites, type checks, lint, etc. With this loop I’ve often had it run for 20-30 minutes straight and end up with usable results. It’s become a game of context management and creating a solid testing feedback loop instead of trying to purely one-shot issues. |
The biggest gotcha I found is that these LLMs love to assume that code is C/Python but just in your favorite language of choice. Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function. It will also consistently ignore most of the code around it, even if it could benefit from reading it to know what specifically could be reused. So you end up with copy-pasta code, and unstructured copy-pasta at best.
The other gotcha is that claude usually ignores CLAUDE.md. So for me, I first prompt it to read it and then I prompt it to next explore. Then, with those two rules, it usually does a good job following my request to fix, or add a new feature, or whatever, all within a single context. These recent agents do a much better job of throwing away useless context.
I do think the older models and agents get better results when writing things to a plan document, but I've noticed recent opus and sonnet usually end up just writing the same code to the plan document anyway. That usually ends up confusing itself because it can't connect it to the code around the changes as easily.