I recently investigated some problematic behaviour of both Opus 4 and Sonnet 4. When tasked to develop something more complicated (broker fed task management system, staggered execution scheduler) they would inevitably produce thousands of lines of over engineered, unmaintainable garbage. When Opus was then tasked to simplify it it boiled it down to 300 lines in one shot. The result was brilliant. This happened twice.
Moral of the story: I found out that I didn't constrain them enough. I now insist that they keep the core logic to a certain size (e.g. 300 lines) and not produce code objects for each concept but rather "fold them into the code".
Moral of the story: I found out that I didn't constrain them enough. I now insist that they keep the core logic to a certain size (e.g. 300 lines) and not produce code objects for each concept but rather "fold them into the code".
This improved the output tremendously.