|
|
|
|
|
by mrothroc
111 days ago
|
|
My experience is similar to yours: LLMs can write excellent code, though you really have to drive them the right way. I use a harness to drive long-run autonomous agents to create production code. (Not open source, but it is an actual product used by companies.)
The key is understanding how they fail, then driving them in a way that sidesteps this. If you let them run too long, they become self-contradictory. However, if you break long work into discrete chunks, then they can still fail, but it changes: they forget things. But that is a much easier thing to catch, because you can use things like lint or even a simple regex for "// TODO" to find them.
Once you set up your pipeline to orchestrate the agents so the errors become easily detectable, and you have gates that check for those errors, the quality goes way, way up. |
|