| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andyferris 35 days ago

To me applying LLMs to a python (or similarly dynamic) code base where it’s currently spaghetti and monkey patched, it can miss things just like I can.

But… I have to admit Opus 4.7 has been very pragmatic in detecting root causes and proposing sensible fixes to bugs in this situation (ie bugs encountered in production not compile time).

It’s also fine at matching current styles and conventions (which is great if they are good styles and conventions).

In terms of new code, rust would have been near impossible to write with such a high degree of non-local reasoning, so I’m assuming these bugs wouldn’t be present.

1 comments

gbro3n 34 days ago

The larger models really are more reliable at following instruction and reasoning their way to solutions. I haven't found that the harness makes that much difference. CoPilot, Claude, Pi, all see similar results for me. What really does make a difference is clean task separation and a clear plan / todo / implement workflow. I've consolidated a lot of the way I work with agents in https://www.agentkanban.io - the task board keeps the tasks discrete and minimal. I built in plan todo implement into the agent instruction that binds the board task to the chat.