| To add to this. I was going through devin's 'pass' diffs from SWE bench. Every one I ended up tracing to actual issues caused changes that would reduce maintainablity or introduced potential side effects. I think it may be useful as a suggestion in a red-green-refactor model, but will end up producing hard to maintain and modify code. Note this one here that introduced circular dependencies, changed a function that only accepted points to one that appears to accept any geometric object but only added lines. Domain knowledge and writing maintainable code is beyond generative transformers. https://github.com/CognitionAI/devin-swebench-results/blob/m... You simply can't get past what Gödel and Rice proved with current technology. It is like when visual languages were supposed to replace programmers. Code isn't really the issue, the details are. |
And to be fair, lots of humans are already at least this bad at writing code. And lots of companies are happy with garbage code so long as it addresses an immediate business requirement.
So Devin wouldn't have to advance much to be competitive in certain simple situations where people don't care about anything that happens more than 2 quarters into the future.
I also agree that producing good code which meets real business needs is a hard problem. In fact, any AI which can truly do the work of a good senior software engineer can probably learn to do a lot of other human jobs as well.