Hacker News new | ask | show | jobs
by cevi 548 days ago
The error rate of human mathematical work is not zero, but it does go down exponentially with the amount of time that the mathematician spends carefully thinking about the problem. Mistakes tend to be the result of typos, time pressure, or laziness - showing your work to others and having them check it over does help (it's one of the reasons we have peer review!), but is absolutely not necessary.

If the error rate is low enough - and by simply spending a constant factor more time finding and correcting errors, you can get it below one in a million - then you do get a virtuous feedback loop even without tying in a proof-checker. That's how humans have progressed, after all. While you are right to say that the proof-checker approach certainly is not trivial, it is currently much easier than you would expect - modern LLMs are surprisingly good at converting math written in English directly to math formalized in Lean.

I do think that LLMs will struggle to learn to catch their mistakes for a while. This is mostly because the art of catching mistakes on your own is not taught well (often it is not taught at all), and the data sets that modern LLMs train on probably contain very, very few examples of people applying this art.

A tangent: how do human mathematicians reliably manage to catch mistakes in proofs? Going line-by-line through a proof and checking that each line logically follows from the previous lines is what many people believe we do, but it is actually a method of last resort - something we only do if we are completely lost and have given up on concretely understanding what is going on. What we really do is build up a collection of concrete examples and counterexamples within a given subject, and watch how the steps of the proof play out in each of these test cases. This is why humans tend to become much less reliable at catching mistakes when they leave their field of expertise - they haven't yet built up the necessary library of examples to allow them to properly interact with the proofs, and must resort to reading line by line.