|
I have been following recent progress in the formalization of mathematical proofs in Lean, particularly in the context of large language models. One prominent advocate of this approach is Terence Tao, who regularly writes about developments in this area. From a programmer's perspective, this puts up an interesting parallel. Models such as Sonnet or Opus 4.5 can generate thousands of lines of code per hour. I can review the output, ask the model to write tests, iterate on the result, and after several cycles become confident that the software is sufficiently correct. For centuries, mathematicians developed proofs by hand, using pen and paper, and were able to check the proofs of their peers. In the context of LLMs, however, a new problem may arise. Consider an LLM that constructs a proof in Lean 4 iteratively over several weeks, resulting in more than 1,000,000 lines of Lean 4 code and concluding with a QED. At what point is an mathematician no longer able to confirm with confidence that the proof is correct? Such a mathematician might rely on another LLM to review the proof, and that system might also report that it is correct. We may reach a stage where humans can no longer feasibly verify every proof produced by LLMs due to their length and complexity. Instead, we rely on the Lean compiler, which confirm formal correctness, and we are effectively required to trust the toolchain rather than our own direct understanding. |
Usually the point of the proof is not to figure out whether a particular statement is true (which may be of little interest by itself, see Collatz conjecture), but to develop some good ideas _while_ proving that statement. So there's not much value in verified 1mil lines of Lean by itself. You'd want to study the (Lean) proof hoping to find some kind of new math invented in it or a particular trick worth noticing.
LLM may first develop a proof in natural language, then prove its correctness while autoformalizing it in Lean. Maybe it will be worth something in that case.