|
I think we're going to see a lot of this. I worked in self-driving and the stuff always 95% worked. Never 100. This is useful in some ways. Thinking about situations like pre-release software testing, there are exploratory test cases that are simply too numerous to ever have a human perform economically. A lot of AI is going to do this kind of very low-valued grunt work where it doesn't matter if it's 90% or 99% correct, it's the fact that it can get done at all. A lot of this work is "additive" in the sense that, it's just too expensive to do today (with a human). The work product of these systems is best seen as a "rough draft" or "suggestion". It's a first cut, not the last word. On the other hand, a lot (most?) of the meat-and-potatoes coding done today, is situations where things have to WORK. Stuff where correctness absolutely matters--billing/money/settlement (calculating tax, handling returns, moving money between accounts), a lot of OS code for things like memory management / locking / resource management, drug dosing, reservation management, etc. Granted, this stuff is a lot more complex and nuanced than the code of an average CRUD app, but then, I also don't spend my days implementing bcrypt, quicksort, or self-synchronizing Unicode parsing. We have libraries for that. The question is whether we're better off relying on agents to write a bunch of grunty code, or come up with better top-level organization / code structures, that doing it "by hand" is the better approach. I'm actually optimistic that we can do better code-wise. But I'd love to see how things develop. Maybe we wouldn't need AI if we just had better programming languages. |
Further, we're adding better test systems to Sweep. For now, you can just comment to get Sweep to cover the edge cases and write tests. Happy to take any other feedback.