Hacker News new | ask | show | jobs
by mbreese 40 days ago
But every abstraction that an LLM has to write is a choice. Your way of writing Python may not match that choice. The next run of the agent might not choose the same way.

Because the language gives you many different tools, an LLM generated codebase can get inconsistent and overly complicated quickly. The flexibility of Python is a downside when you’re having an LLM generate the code. If you’re working in an existing codebase, it’s great - those choices were already made and it can match your style.

When an LLM has to derive its own style is when things can devolve into a jumbled mess.

2 comments

To me applying LLMs to a python (or similarly dynamic) code base where it’s currently spaghetti and monkey patched, it can miss things just like I can.

But… I have to admit Opus 4.7 has been very pragmatic in detecting root causes and proposing sensible fixes to bugs in this situation (ie bugs encountered in production not compile time).

It’s also fine at matching current styles and conventions (which is great if they are good styles and conventions).

In terms of new code, rust would have been near impossible to write with such a high degree of non-local reasoning, so I’m assuming these bugs wouldn’t be present.

The larger models really are more reliable at following instruction and reasoning their way to solutions. I haven't found that the harness makes that much difference. CoPilot, Claude, Pi, all see similar results for me. What really does make a difference is clean task separation and a clear plan / todo / implement workflow. I've consolidated a lot of the way I work with agents in https://www.agentkanban.io - the task board keeps the tasks discrete and minimal. I built in plan todo implement into the agent instruction that binds the board task to the chat.
That’s why proper using LLMs on large python codebases establish coding standards docs and tests. Turning the LLM loose is chaos, but having clear arch and naming and other standards can get pretty consistent results.