Hacker News new | ask | show | jobs
by csense 100 days ago
It seems like sound testing methodology to identify important theorems related to the code, prove them, and then verify the proof.

Verification gets sold as "bulletproof" but I'm skeptical for a couple reasons:

- How do you establish the relationship between the code and the theorem? Lean theorem can be applied to zlib implemented in Lean, what if you want to check zlib implemented in a normal programming language like C, JS, Zig, or whatever?

- How do you know the key properties mean what you think they mean? E.g. the theorem says "ZlibDecode.decompressSingle (ZlibEncode.compress data level) = .ok data" but it feels like it would be very easy to accidentally prove ∃ x s.t. decompress(compress(x)) == x while thinking you proved ∀ x, decompress(compress(x)) == x.

I've tried Lean and Coq and...I don't really like them. The proofs use specialized programming languages. And they seem deliberately designed to require you to use a context explorer to have any hope of understanding the proof at all. OTOH a normal unit test is written in a general purpose programming language (usually the same one as the program being tested), I'm much more comfortable checking that a Claude-written unit test does what I think it's doing than a Claude-written Lean proof of correctness.

2 comments

The article does not reveal it to me either how the existing code would be mapped to Lean and back. The impression from zlib example is that I'd be expected to program in Lean. No way it's going to happen. The language is too complex for me and my average colleague. We're also not going to have two parallel implementations in ordinary language and Lean and compare them with 'differential random testing' (see https://aws.amazon.com/blogs/opensource/lean-into-verified-s... someone linked in the discussion), that's just too taxing for bigger products, let alone we typically don't have enough time to do one implementation right.

The gap of having succinct, expressive, powerful and executable specification to be able to continuously verify AI-generated programs is real, but I don't see how Lean alone closes it. If the author's intention was to attract community to help build that out with Lean in the center, it's not clear to me where to even start. Since the author provided no hints or direction, I've a feeling it's not clear to them either.

Lean and Coq might not be the right choice. Perhaps something like Dafny, F*, or Why3, where code and theorems live together.

Nevertheless, Lean 4 has closed the gap and it's closer to those than Coq at this stage.