Hacker News new | ask | show | jobs
by daxfohl 112 days ago
Yeah, even for simple things, it's surprisingly hard to write a correct spec. Or more to the point, it's surprisingly easy to write an incorrect spec and think it's correct, even under scrutiny, and so it turns out that you've proved the wrong thing.

There was a post a few months ago demonstrating this for various "proved" implementations of leftpad: https://news.ycombinator.com/item?id=45492274

This isn't to say it's useless; sometimes it helps you think about the problem more concretely and document it using known standards. But I'm not super bullish on "proofs" being the thing that keeps AI in line. First, like I said, they're easy to specify incorrectly, and second, they become incredibly hard to prove beyond a certain level of complexity. But I'll be interested to watch the space evolve.

(Note I'm bullish on AI+Lean for math. It's just the "provably safe AI" or "provably correct PRs" that I'm more skeptical of).

1 comments

>But I'm not super bullish on "proofs" being the thing that keeps AI in line.

But do we have anything that works better than some form of formal specification?

We have to tell the AI what to do and we have to check whether it has done that. The only way to achieve that is for a person who knows the full context of the business problem and feels a social/legal/moral obligation not to cheat to write a formal spec.

Code review, tests, a planning step to make sure it's approaching things the right way, enough experience to understand the right size problems to give it, metrics that can detect potential problems, etc. Same as with a junior engineer.

If you want something fully automated, then I think more investment in automating and improving these capabilities is the way to go. If you want something fully automated and 100% provably bug free, I just don't think that's ever going to be a reality.

Formal specs are cryptic beyond even a small level of complexity, so it's hard to tell if you're even proving the right thing. And proving that an implementation meets those specs blows up even faster, to the point that a lot of stuff ends up being formally unprovable. It's also extremely fragile: one line code change or a small refactor or optimization can completely invalidate hundreds of proofs. AI doesn't change any of that.

So that's why I'm not really bullish on that approach. Maybe there will be some very specific cases where it becomes useful, but for general business logic, I don't see it having useful impact.