Hacker News new | ask | show | jobs
by eldavido 1043 days ago
I think we're going to see a lot of this. I worked in self-driving and the stuff always 95% worked. Never 100.

This is useful in some ways. Thinking about situations like pre-release software testing, there are exploratory test cases that are simply too numerous to ever have a human perform economically. A lot of AI is going to do this kind of very low-valued grunt work where it doesn't matter if it's 90% or 99% correct, it's the fact that it can get done at all. A lot of this work is "additive" in the sense that, it's just too expensive to do today (with a human).

The work product of these systems is best seen as a "rough draft" or "suggestion". It's a first cut, not the last word.

On the other hand, a lot (most?) of the meat-and-potatoes coding done today, is situations where things have to WORK. Stuff where correctness absolutely matters--billing/money/settlement (calculating tax, handling returns, moving money between accounts), a lot of OS code for things like memory management / locking / resource management, drug dosing, reservation management, etc.

Granted, this stuff is a lot more complex and nuanced than the code of an average CRUD app, but then, I also don't spend my days implementing bcrypt, quicksort, or self-synchronizing Unicode parsing. We have libraries for that. The question is whether we're better off relying on agents to write a bunch of grunty code, or come up with better top-level organization / code structures, that doing it "by hand" is the better approach.

I'm actually optimistic that we can do better code-wise. But I'd love to see how things develop. Maybe we wouldn't need AI if we just had better programming languages.

1 comments

I think for teams that want to move fast in a non-critical environment (health, finance etc.) something that works 90% of the time is fine. Getting to 95% takes twice the amount of time but does not provide twice the value. When the 5% difference becomes the difference maker we can fix it later.

Further, we're adding better test systems to Sweep. For now, you can just comment to get Sweep to cover the edge cases and write tests. Happy to take any other feedback.

Sorry but unless the core business is making statistical predictions, then you're wrong. Other industries (like health and finance) still need robust applications with like 99.99% uptime.