>> Of course it writes a lot of code. It gets paid per token.
I don't buy it. I think a much more likely reason it leans towards adding code is because deleting code carries inherent risk: it can break things in major ways or minor ways or very visibly or invisibly. Adding new code, on the other hand, is a lot safer: the only parts that can break are those the AI touched inside its own working context. So it doesn't have to go down rabbit holes and potentially create bigger and bigger messes.
Then local models shouldn't suffer from the same problems, but they do. They just aren't trained in the direction of "less code == better long-term maintainability" I'd say, rather than some grand "increased-token-usage" conspiracy.
You can certainly steer them a bit to reduce the issue parent talks about, but they still go into that direction whenever they can, adding stuff on top of stuff, piling hacks/shim on top of other hacks/shims, just like many human developers :)
Training data is the masses of code from everyone.
Restrict that data to just the best of the best, the tersest of the tersest, and we’d see better output. I don’t think people are sharing that kinda stuff (Jane Street’s gems stay locked up), and even if they did my presumption is that it’d be too narrow and demanding for general audiences.
Big hopes for the long future, damned to some degree of mediocrity in the near term mass product.
I don't buy it. I think a much more likely reason it leans towards adding code is because deleting code carries inherent risk: it can break things in major ways or minor ways or very visibly or invisibly. Adding new code, on the other hand, is a lot safer: the only parts that can break are those the AI touched inside its own working context. So it doesn't have to go down rabbit holes and potentially create bigger and bigger messes.