|
|
|
|
|
by addaon
10 days ago
|
|
I've been playing with related ideas recently, and can talk about them at length... but one thing that's surprised me is just how effective frontier models (ChatGPT-5.5 in particular) are at completing certain manual proofs in the Roqc (né Coq) proof assistant. The proofs aren't always pretty, but ChatGPT can often prove something in minutes and 10 - 100 iterations that would take me, a human who has limited but non-zero proof assistant experience but significant domain experience in the lemmas being proven, much much longer. The reason I think this is relevant and interesting in the context of the (short) article is because it challenges basic assumptions about how certain formal method tooling works. Frama-C, Ada/SPARK, etc are focused on generating proof obligations that can be automatically discharged by CVC5, Z3, etc; with a relatively painful fallback of manually proving the obligation in Rocq. The most common workflow seems to be to discover that an obligation is "hard" (not automatically discharged), and to restructure the set of visible lemmas and assertions at the obligation generation point in the code to try to make it "easy"; or even restructuring the code to try to make it easy. Which, in a world where Roqc proofs are doubly expensive (first because they're a challenge for a human to write; second because they tend to require quite a bit of maintenance churn as the code and proof around them evolves), makes sense. But if Roqc proofs are "automatically discharged with AI in the loop", this cost delta goes away -- and it becomes possible to think about writing proofs the same we (usually) write code, with human readability and clarity as the first goal, and [compiler|prover] optimization a distant second. Which I find quite exciting, although I haven't fully digested the implications yet. |
|