| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by porcoda 107 days ago

I’ve had similar experiences with code I’ve proven correct, although my issues were of the more common variety than the overflow issue - subtle spec bugs. (I think the post mentions the denial of service issue as related to this: a spec gap)

If you have a spec that isn’t correct, you can certainly write code that conforms to that spec and write proofs to support it. It just means you have verified a program that does something other than what you intended. This is one of the harder parts of verification: clearly expressing your intention as a human. As programs get more complex these get harder to write, which means it isn’t uncommon to have lean or rocq proofs for everything only to later find “nope, it has a bug that ultimately traces back to a subtle specification defect.” Once you’ve gone through this a few times you quickly realize that tools like lean and rocq are tricky to use effectively.

I kinda worry that the “proof assistants will fix ai correctness” will lead to a false sense of assurance if the specs that capture human intention don’t get scrutinized closely. Otherwise we’ll likely have lots of proofs for code that isn’t the code the humans actually intended due to spec flaws.

4 comments

jmalicki 107 days ago

I have experience with similar things!

But that's not saying the proofs are an issue - usually the spec you can reasonably prove in lean or another prover, say TLA+ or Z3 depending on your kind of program - has to be overly simplified and have a lot of assumptions.

However, that is powerful.

It doesn't mean your program doesn't have bugs.

It means this big scary complicated algorithm you think works but are skeptical doesn't have bugs - so when you encounter one, you know the bug is elsewhere, and you start really looking at the boundaries of what could be misspecified, if the assumptions given to the prover are actually true, etc.

It eliminates the big scary thing everyone will think is the cause of the bug as the actual cause.

This has been insanely valuable to me lately. It is also something I never really was able to do before the help of AI - vibe coding proofs about my programs is IMO one of the killer apps of AI, since there aren't a ton of great resources yet about how to do it well since it is rarely done.

cubefox 107 days ago

> This has been insanely valuable to me lately.

This surprises me. Formal verification so far has been a very niche thing apart from conventional type systems. I didn't think lack of vibe coding was much of a bottleneck in the past. Where do you use it?

baq 107 days ago

The problem is implementing anything approximately twice is a hard sell… this is no longer true, though - TLA+ models are cheap now. You should be using them when writing any sort of distributed systems, which is basically everything nowadays.

jmalicki 107 days ago

Roughly anything that, say, has the complexity of a leetcode medium level problem that isn't already an extremely well known algorithm.

Any moderately complex thread safety thing with a few moving parts (e.g. there are multiple mutexes involved in various parts of the system, verify no deadlocks).

The lack of vibe coding has been a bottleneck for literally everything before.

When I see people say the hate vibe coding, I think "why do you hate formal verification? Because you could be spending your time on formal verification instead of removing "code smells" that don't hurt anything from vibe code."

Nevermark 107 days ago

That is damn insightful.

somat 107 days ago

Whenever I read an article about formal verification systems there is always that nagging thought in the back of my head. Why can you trust your formal verification system to be bug free but you can't trust the program. should not the chance of bugs be about equal in both of them?

You have a program that does something and you write another program to prove it. What assurance do you have that one program has fewer bugs then the other? Why can one program have bugs but the other can't? How do you prove that you are proving the right thing? It all sort of ties into Heisenberg's uncertainty theorem. A system cannot be fully described from within that system.

Don't get me wrong, I think these are great systems doing great work. But I always feel there is something missing in the narrative.

I think a more practical view is that a program is already a sort of proof. there is a something to be solved and the program provides a mechanism to prove it. but this proof may be and probably is incorrect, as bugs are fixed it gets more and more correct. A powerful but time consuming tool to try and force correctness is to build the machine twice using different mechanisms. Then mismatched output indicates something is wrong with one of them. and your job as an engineer is to figure out which one. This is what formal verification brings to the table. The second mechanism.

majormajor 107 days ago

> Whenever I read an article about formal verification systems there is always that nagging thought in the back of my head. Why can you trust your formal verification system to be bug free but you can't trust the program. should not the chance of bugs be about equal in both of them?

A bug in the formal verification tool could be potentially noticed by any user of that formal verification tool. (And indirectly by any of their users noticing a bug about which they say "huh, I thought the tool told me that was impossible.")

A bug in your program can only be potentially noticed by you and your users.

There are also entirely categories of bugs that may not be relevant. For instance, if I'm trying to prove correctness of a distributed concurrent system and I use a model+verifier that verifies things in a sequential, non-concurrent way, then I don't have to worry about the prover having all the same sort of race conditions as my actual code.

But yeah, if you try to write your own prover to prove your own software, you could screw up either. But that's not what is being discussed here.

rdevilla 107 days ago

> It all sort of ties into Heisenberg's uncertainty theorem. A system cannot be fully described from within that system.

Surely you are talking about Godel incompleteness, not Heisenberg's uncertainty principle; in which case they're actually not the same system - the verification/proof language is more like a metalanguage taking the implementation language as its object.

(Godel's observation for mathematics was just that for formal number systems of sufficient power, you can embed that metalanguage into the formal number system itself.)

ngruhn 107 days ago

I think formal verification brings a bit more to the table. The logical properties are not just a second implementation. They can be radically simpler. I think quantifiers are doing a lot of work here (forall/exists). They are not usable directly in regular code. For example, you can specify that a shortest path algorithm must satisfy:

    forall paths P from A to B:
        len(shortest(A,B)) <= len(P)

That's much simpler than any actual shortest path algorithm.

UltraSane 107 days ago

AWS has said that formal verification enables their engineers to implement aggressive performance optimizations on complex algorithms without the fear of introducing subtle bugs or breaking system correctness. It helped double the performance of the IAM ACL evaluation code

zarzavat 107 days ago

Formal verification is just an extra step up from the static types that you might have in a language such as Rust.

Common static types prove many of the important properties of a program. If I declare a variable of type String then the type checker ensures that it is indeed a String. That's a proof. Formal verification takes this further and proves other properties such as the string is never empty.

Common static types are very effective. Many users of Rust or Haskell will claim that if a program compiles then it usually works correctly when they go to run it.

However there is a non-linear relationship between probability of program correctness and the amount of types required to achieve it. Being almost certain requires vastly more types than just being confident.

That's the real issue with formal verification, being 75% sure and having less code is better than being 99% sure in most situations, though if I were programming a radiotherapy machine I might think differently.

raincole 107 days ago

> should not the chance of bugs be about equal in both of them?

Why?

Are you saying that all the programs ever written have the exact same chance of bugs? A hello world is as buggy as a vibe-coded Chromium clone?

If you accept the premise that different programs have different chances to have bugs, then I'd say:

1. Simpler programs are likely less buggy.

2. Programs used by more people are likely less buggy.

3. Programs maintained by experts who care about correctness are likely less buggy.

4. Programs where the stakes are higher are likely less buggy.

All things considered, I think it's fair to say Lean is likely less buggy then a random program written by me at weekend.

> Heisenberg's uncertainty theorem

It has nothing to do with the uncertainty principle. If you think otherwise, it means your understanding of uncertainty principle comes from sci-fi :)

CJefferson 107 days ago

The chances of significant bugs in lean which lead to false answers to real problems are extremely small (this bug still just caused a crash, but is still bad). Many, many people try very hard to break Lean, and think about how proofs work, and fail. Is it foolproof? No. It might have flaws, it might be logic itself is inconsistent.

I often think of the ‘news level’ of a bug. A bug in most code wouldn’t be news. A bug which caused lean to claim a real proof someone cared about was true, when it wasn’t, would in the proof community the biggest news in a decade.

Dylan16807 107 days ago

> should not the chance of bugs be about equal in both of them?

Even if it is, the verification is still very useful. The verifier is going to run for a few minutes and probably not hit many edge cases. The chance it actually hits a bug is low, and the chance a bug makes it wrongly accept your program is a lot lower. Especially if it has to output a proof at the end. Meanwhile it's scrutinizing every single edge case your program has.

Nevermark 107 days ago

A program is a hard proof of existence.

It runs (maybe crashes), therefore … it exists.

The tension between spec bugs vs. implementation bugs is real. But i will take a bug in a situation where the implementation has been verified any day.

Working over what we really want is problem solving in the problem domain.

As apposed to going into the never ending implementation not-what-we-were trying to solve weeds.

andai 107 days ago

Let me try wrap my head around this.

I notice two classes of bugs in my own programs:

- I meant the code to do X, but it does Y

- I meant the code to do X, and it does X, but X causes problems I didn't foresee

A proof assistant can help you prove the code does X, but it can't help you prove doing X doesn't cause problems you didn't foresee.

In other words, it can prove the soundness of the implementation, but not the design?

Is that right? Though I imagine if there are internal contradictions in the design, lean would catch those too.

So the issue would be "internally consistent yet incorrect designs"?

In the case of TFA, the issue was exhaustiveness, right? "What happens if..."

That sounds like a pretty important quality as far as security goes. Is there a way to make sure everything is actually verified?

I heard actually verifying everything is prohibitively expensive though, like it took 10-20 years to verify seL4 (10K LoC).

brookst 107 days ago

Been thinking about this a lot recently.

I think we need a way to verify the specs. A combo of formal logic and adversarial thinking (probably from LLMs) that will produce an exhaustive list of everything the program will do, and everything it won’t do, and everything that is underspecified.

Still not quite sure what it looks like, but if you stipulate that program generation will be provable, it pushes the correctness challenge up to the spec (and once we solve that, it’ll be pushed up to the requirements…)

frumplestlatz 107 days ago

What’s important is to prove useful, high-level properties derived from the specs. The specs of program behavior are just the price of admission.

brookst 107 days ago

I agree. It’s kind of like secure boot, in reverse: the high level stuff has to be complete and correct enough that the next level down has a chance to be complete and correct.