| HN Mirror

As evidenced by the reaction to Devin, no, it’s not real.

There’s a limit, beyond which agent generated code is, in general, not reliable.

All of the people who claim otherwise (like the Devin videos) have shown to be fake (1) or cherry-picked.

Having agent generated code is arbitrary code to solve arbitrary problems is. Not. A. Solved. Problem.

Yet.

…no matter, no matter how many AI bros claim otherwise, currently.

Being able to decompose complex problems into part small enough to be able to be solved by current models would be a big deal if it was real.

(Because, currently the SoTA can’t reliably do this; this should not be a remotely controversial claim to people familiar with this space)

So tldr; extraordinary claims require extraordinary evidence. Which is absent here, as far as I can tell. They specifically call out in the paper that generated actions are overly specific and don’t always work; but as I said, it’s doing well on the leader board, so it’s clearly doing something, which is working, but there’s just noooooo way of seeing what.

[1] - https://www.zeniteq.com/blog/devins-demo-as-the-first-ai-sof...