| Tried it a few weeks ago for a task (had a few dozen files in an open source repo I wanted to write tests for in a similar way to each other). I gave it one example and then asked it to do the work for the other files. It was able to do about half the files correctly. But it ended up taking an hour, costing >$50 in OpenAI credits, and took me longer to debug, fix, and verify the work than it would have to do the work manually. My take: good glimpse of the future after a few more Moore’s Law doublings and model improvement cycles make it 10x better, 10x faster, and 10x cheaper. But probably not yet worth trying to use for real work vs playing with it for curiosity, learning, and understanding. Edit: writing the tests in this PR given the code + one test as an example was the task: https://github.com/roboflow/inference/pull/533 This commit was the manual example: https://github.com/roboflow/inference/pull/533/commits/93165... This commit adds the partially OpenDevin written ones: https://github.com/roboflow/inference/pull/533/commits/65f51... |
I have found it immensely useful for a handful of one-off tasks, but it's not yet a mission-critical part of my workflow (the way e.g. Copilot is).
Core model improvements (better, faster, cheaper) will definitely be a tailwind for us. But there are also many things we can do in the abstraction layer _above_ the LLM to drive these things forward. And there's also a lot we can do from a UX perspective (e.g. IDE integrations, better human-in-the-loop experiences, etc)
So even if models never get better (doubtful!) I'd continue to watch this space--it's getting better every day.