In particular, note that a key lemma of crystalline cohomology rests on a mistake. Experts think that it is fixable by virtue that results have depended on it for a long time and no issue was found, but it is not fixed.
I think you are trying to say that this matter has since been resolved and so presumably the whole informal proof somehow resides in literature. I suppose that to the first point you may be right (I'm guessing that it's since been made available), but to the second point I think you are overconfident that similar gaps do not exist.
I admit that my original comment was inaccurate, as it seems to suggest that the gap still exists.
"The International Mathematical Olympiad (IMO) is the World Championship Mathematics Competition for High School students", so not to undermine it but it's below university or graduate level.
Research level mathematics like this is as hard as it gets, and this proof is famously difficult: uses many branches of advanced mathematics, required thousands of pages of proofs, years of work.
Human readable (informal) proofs are full of gaps that all have to be traced back to axioms e.g. gaps that rely on shared intuition, background knowledge and other informal proofs.
It's somewhat like taking rough pseudo code (the informal proof, a mixture of maths and English) and translating that into a bullet-proof production app (the formal proof, in Lean), where you're going to have to specify every step precisely traced back to axioms, handle all the edge causes, fix incorrect assumptions, and fill in the missing parts that were assumed to be straightforward but might not be.
A major part is you also have to formalise all the proofs your informal proof relied on so everything is traced back to the initial axioms e.g. you can't just cite Pythagorus theorem, you have to formalise that too.
So it's an order of magnitude more difficult to write a formal proof compared to an informal one, and even when you have the informal proof it can teams many years of effort.
I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong? I mean if there are gaps and other informal proofs in there?
Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him. I also once asked Serge Lang why so much mathematics is
correct (which surprised me coming from programming where everything has bugs), and he said; “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.” Another related quote Bryan Birch told me once: “it is always a good idea to prove true theorems.”
> Kevin Buzzard told me that the worry that it might in fact be wrong is a huge motivator for him.
And then, when I raised concerns in Zulip about Lean's metaprogramming facilities being used to trick the pipeline into accepting false proofs, he said the opposite. He even emphasized that the formalization efforts are not for checking proof correctness, but for cataloguing truths we believe in.
This kind of equivocation turned me away from that community, to be honest. That was an extremely frustrating experience.
> “people do a large number of consistency checks beyond what is in the published proofs, which makes the chances the claimed results are correct much, much higher.”
I imagine one bias is because formal verification is such a huge effort, you're only going to do it for really interesting and impactful proofs, which means the proofs that get formal verified will already have been reviewed and critiqued a lot, and will be less likely to have glaring critical errors.
> I’m almost certain this is ignorance on my part, but it seems like this would mean the proof is… possibly wrong?
That's part of the motivation to formalise it. When a proof gets really long and complex, relies on lots of other complex proofs, and there's barely any single expert who has enough knowledge to understand all the branches of maths it covers, there's more chance there's a mistake.
There's a few examples here of errors/gaps/flaws found while formalizing proofs that were patched up:
My understanding is it's common to find a few problems that then need to be patched or worked around. It's a little like wanting to know if a huge codebase is bug-free or not: you might find some bugs if you formalized the code, but you can probably fix the bugs during the process because it's generally correct. There can be cases where it's not fixable though.
I think it was Terrance Tao on the Lex Friedman podcast recently that said that there are very often little mistakes in big proofs, but they are almost always able to be patched around. Its like mathematicians' intuition is tracking some underlying reality and the actual formalization is flexible. Yes sometimes digging down into a small mistake leads to an unbridgeable gap and that route has to be abandoned, but uncannily often such issues have nearby solutions.
Also a lot of errors would be called "typos", not errors. Such as some edge cases missing in the theorem statement which technically makes the theorem false. As long as there's a similar theorem in the same spirit that can be proven, that's what the original was all along.
The level of rigor used in math if sometimes characterized as "sufficient to convince other mathematicians of correctness." So, yeah possibly, but not in a willy nilly way. It's not a proof sketch, it's a proof. It just isn't written in human language designed for communication.
The thing is though that MANY bugs slip through even the most thorough code reviews. As a security researcher, I can tell you there is literally no system out there that doesn't have such a bug in it.
The systems we deal with in software are massive compared with your typical mathematical framework though. But FLT is probably on similar scope.
I have little knowledge in this area, but my understanding is it's like this:
There is a pseudocode app that depends on a bunch of pseudocode libraries. They want to translate that pseudocode app into a real runnable app. They can do that, and it's a good amount of work, but reasonable. The problem is to get the app to run they also need to translate the hundreds or thousands of pseudocode libraries into actual libraries. Everything from OS APIs, networking libs, rendering libs, language standard libs all need ro be converted from specs and pseudocode to real code to actually run the app. And that's a ton of work.
No, some of the harder work has been done. Translating human-readable proofs into machine-readable ones is also very hard work and an area of active research.
This is a significantly harder problem than winning gold in IOM. A large part of it is figuring out how to represent some of the relevant ideas in Lean at all.
IMO problems are stated in and solved by math a high schooler could understand. Getting the definitions right (one of the harder parts of mechanizing proofs IME) is a different beast altogether.
Formalizing Wiles' proof requires translating hundreds of pages of sophisticated mathematics with implicit reasoning steps into a precise logical framework, which is fundamentally different from the pattern-matching AI uses to solve competition problems.
All three claims of gold medal performance on IMO 2025 that I'm aware of solved the first 5 problems, that were designed to be solvable by application of standard techniques, but got stumped on the sixth problem that was a bit more unusual. So it does seem like state-of-the-art models solve competition problems by recognizing which kind of problem it is and applying a corresponding solution template. Which is not too different from human competitors exploiting common question patterns, but humans seem to be able to degrade more gracefully by falling back to a more explorative mode when none of the standard tricks seem to apply.
My understanding is that the proof doesn't exist in written form in its entirety.
Plus, Kevin Buzzard is a world expert with some ideas for how to better organize the proof. In general, formalization leads to new understanding about mathematics.
Something people outside of mathematics don't tend to appreciate is that mathematicians are usually thinking deeply about what we already know, and that work reveals new structures and connections that clarify existing knowledge. The new understanding reveals new gaps in understanding, which are filled in, and the process continues. It's not just about collecting verifiably true things.
Even if somehow the OpenAI algorithm could apply here, we'd get less value out of this whole formalization exercise than to have researchers methodically go through our best understanding of our best proof of FLT again.
Proof assistant code is high reliability, there is no room to fudge it. This is perhaps the one place where you can really see how bad LLMs are when you care about reliability.
Proofs, sure, but not definitions. A human needs to be sure that the definitions align with that they expect. Unfortunately, humans generating correct definitions and LLM's generating correct proofs are not independent problems.
Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)
Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.
A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.
1. Load a complex data structure, e.g. from JSON. (I do not consider typing out the JSON to be part of the human's job.)
2. Process the data, using the magic of functional programming. (Performance? What's that? Performance is not a priority.)
3. Add some helpful lemmas.
4. Prove the data processing stage was correct… but proving this in general is haaaard, so just make automation that's capable of proving it for most cases… eventually.
5. Great! It takes 3 minutes to run on my tiny examples. Now run it on the real data.
Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.
When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.
It's, as far as I know, quite hard to teach an LLM things it doesn't know.
I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.
It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.
In particular, note that a key lemma of crystalline cohomology rests on a mistake. Experts think that it is fixable by virtue that results have depended on it for a long time and no issue was found, but it is not fixed.