| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zahlman 11 days ago
	> how effective frontier models (ChatGPT-5.5 in particular) are at completing certain manual proofs in the Roqc (né Coq) proof assistant. The proofs aren't always pretty, but ChatGPT can often prove something in minutes and 10 - 100 iterations that would take me, a human who has limited but non-zero proof assistant experience but significant domain experience in the lemmas being proven, much much longer. ... How do you know that the proofs are themselves correct?

2 comments

addaon 11 days ago

With the proof checker.

bobkb 11 days ago

I assume your idea is, if the spec and the proof is verified the code generated is good enough as well ?

addaon 11 days ago

Today, I write the code. It’s trivial and takes a lot less time than writing the spec, and since I’m using conventional tooling for WCET and stack sizing it’s nice to get those right up front. The LLMs sometimes tweak the code slightly for provability, but this is usually either direct operator replacement (shift with multiplication, and with modulus, etc) or factoring out a block to a function to tie a contract onto it, both of which I trust my compiler to undo (simple arithmetic operations and inlining, respectively) with zero to minimal impact on the generated binary.

red75prime 11 days ago

Proof checkers fuel the AI hype by outputting "valid" for a hallucinated text. /s