|
|
|
|
|
by porcoda
60 days ago
|
|
I’ve had similar experiences with code I’ve proven correct, although my issues were of the more common variety than the overflow issue - subtle spec bugs. (I think the post mentions the denial of service issue as related to this: a spec gap) If you have a spec that isn’t correct, you can certainly write code that conforms to that spec and write proofs to support it. It just means you have verified a program that does something other than what you intended. This is one of the harder parts of verification: clearly expressing your intention as a human. As programs get more complex these get harder to write, which means it isn’t uncommon to have lean or rocq proofs for everything only to later find “nope, it has a bug that ultimately traces back to a subtle specification defect.” Once you’ve gone through this a few times you quickly realize that tools like lean and rocq are tricky to use effectively. I kinda worry that the “proof assistants will fix ai correctness” will lead to a false sense of assurance if the specs that capture human intention don’t get scrutinized closely. Otherwise we’ll likely have lots of proofs for code that isn’t the code the humans actually intended due to spec flaws. |
|
But that's not saying the proofs are an issue - usually the spec you can reasonably prove in lean or another prover, say TLA+ or Z3 depending on your kind of program - has to be overly simplified and have a lot of assumptions.
However, that is powerful.
It doesn't mean your program doesn't have bugs.
It means this big scary complicated algorithm you think works but are skeptical doesn't have bugs - so when you encounter one, you know the bug is elsewhere, and you start really looking at the boundaries of what could be misspecified, if the assumptions given to the prover are actually true, etc.
It eliminates the big scary thing everyone will think is the cause of the bug as the actual cause.
This has been insanely valuable to me lately. It is also something I never really was able to do before the help of AI - vibe coding proofs about my programs is IMO one of the killer apps of AI, since there aren't a ton of great resources yet about how to do it well since it is rarely done.