Hacker News new | ask | show | jobs
by cubefox 974 days ago
I wonder whether we could combine formal proof checkers (like the Lean proof checker) with language models that generate synthetic conjecture-proof pairs in a formal language like Lean.

The Lean proof checker could be used to automatically verify whether the synthetic proofs written by the language model are correct. This information could be used to provide an RL reward signal applied to the original language model, which would result in it writing better proofs. (Or we train a new model using the correct synthetic proofs of the previous round as training data.) And then the process repeats. So the model would self-train using its synthetic training data, without further human intervention.

We could even make this process more adversarial. First we split the generator language model into two: One which generates conjectures, and one which tries to prove/disprove them in Lean. Then add a predictor model which tries to predict whether a synthetic proof is verified by the Lean proof checker. The lower the predicted probability that the proof will be correct, the more reward gets the proof-generator model if it did indeed provide a correct proof.

Finally, we add another model which tries to predict the reward the proof-generator model will get for a given synthetic conjecture. Then the conjecture-generator model is rewarded for conjectures that are predicted to yield a high reward in the proof-generator model. So conjectures that are neither too hard not too easy for the proof-generator model.

So we would expect that the whole system would progressively create harder and harder synthetic proofs, which in turn allows for better and better self-training of the proof-generator.

It seems this could in principle scale to superhuman ability in generating proofs. The process would be somewhat similar GANs or to self-play in AlphaGo Zero.

I think the hard part is the initial bootstrapping part, to get the whole process off the ground. Because the initial training of the generator models has to be done with human provided training data (Lean proofs). But once the synthetic proofs are good enough, the system would self-train itself automatically.

4 comments

This workflow is an explicit goal of the Lean 4 devs. The official Zulip chat has a channel dedicated to the interface between the two:

https://leanprover.zulipchat.com/#streams/219941/Machine%20L...

That would be interesting, though this link requires an account.
It would also be interesting to train these models on the entire math literature. One would want to use this to verify the math literature, translating the proofs there to checkable forms. Eventually, we would get a model that knows every theorem and every correct proof in the math literature.
Sounds like you've stumbled into the wonderful world of machine-learning guided proof synthesis! While I don't think the full system you're describing has been built yet, many similar systems and pieces have. In terms of the initial phase of supervised learning on existing proofs to prove new ones, there's TacticToe (https://arxiv.org/abs/1804.00596), Tactician (https://arxiv.org/pdf/2008.00120.pdf), CoqGym/ASTactic (https://arxiv.org/abs/1905.09381), Proverbot9001 (https://arxiv.org/abs/1907.07794), and Diva (https://dl.acm.org/doi/10.1145/3510003.3510138#sec-terms), among others. Most of these have some sort of language model within them, but if you're specifically looking for the LLM's that have been big recently, there's GPT-f (https://arxiv.org/abs/2009.03393), Baldur (https://arxiv.org/abs/2303.04910), and COPRA (https://arxiv.org/abs/2310.04353), though currently these models don't seem as effective as the specialized non-LLM tools. In terms of using reinforcement learning to learn beyond human written proofs, there's TacticZero (https://openreview.net/forum?id=edmYVRkYZv), this paper from OpenAI (https://arxiv.org/pdf/2202.01344.pdf), rlCoP (https://arxiv.org/abs/1805.07563), the HOList line of work (https://arxiv.org/pdf/1905.10006.pdf), and HyperTree Proof Search (https://arxiv.org/abs/2205.11491), as well as some in progress work I'm working on with a team at the University of Massachusetts.
Wow, that's a great overview. While most of it goes over my head, it is interesting that quite a lot of research has been done in this direction.

The system I outlined above was inspired by this intriguing paper:

https://arxiv.org/abs/2207.14502

It is superficially about a different topic, namely generating and solving "programming puzzles", basically a type of unit test, that are then verified by the python interpreter. This system seems quite analogous to one where the programming puzzles are replaced by conjectures in a language like Lean, the proposed puzzle solutions are instead proposed proofs, and the python interpreter is replaced by the proof assistant.

I just discovered that there seems to be at least one guy working on applying a similar system to formal mathematics: https://www.cs.cmu.edu/~jlaurent/pdf/reports/thesis-proposal... He actually cites the paper above.

There are people doing this already. For Lean, an example is https://morph.so/blog/the-personal-ai-proof-engineer/
If I understand this correctly, they want to use an AI model for "autoformalization", which sounds like "using a language model to translate natural language proofs into Lean proofs". Which is cool but much less ambitious than the self-train system I described above. I guess AI technology isn't yet far enough to make such a proposal work.
Given a billion dollars or so, I guess one moght get pretty far. The roadblock might be more that nobody is seeing a lot of market potential.