Hacker News new | ask | show | jobs
by vessenes 534 days ago
I'd call this paper a "big deal" in that it is a normalization of, very fair summary of, and indication that there is a future for, LLMs in pure mathematics from one of its leading practitioners.

On HN here, we've spent the last few years talking and thinking a lot about LLMs, so the paper might not include much that would be surprising to math-curious HN'ers. However, there is a large cohort of research mathematicians out there that likely doesn't know much about modern AI; Terence is saying there's a little utility in last-gen models (GPT-4), and he expects a lot of utility out of combining next-gen models with Lean.

Again, not surprising if you read his blog, but I think publishing a full paper in AMS is a pretty important moment.

2 comments

Our world is increasingly defined by software without correctness proofs. Our tools are too clumsy, and we're just not smart enough, so we accept this situation. AI-verified code could become one of the most economically important applications of machine learning, when we cross the threshold where this becomes feasible.

I'm a mathematician, and we struggle with the purpose of a proof: Is it to verify, or also to explain so we can generalize? Machine proofs tend to be inscrutable.

While "the singularity" is unlikely anytime soon, we should recall that before the first atomic test there were calculations to insure that we didn't ignite the atmosphere. We're going to need proofs that we have control of AI, and there's an obvious conflict of interest in trusting AI's say so here. We need to understand these proofs.

This is also my take on this. IMHO, LLMs + theorem provers have the potential to make formal methods cheap enough to use more widely.

And we should give more credit to the theorem prover part of the equation, which comes in part from old AI symbolic efforts.

A good thing about machine proofs is that, just like code, they can be refactored. I also use LLMs when writing code, but the code that I end up pushing is almost never exactly what the LLM has generated. I don't really see the problem with that. It's way less taxing for me mentally to get an LLM to generate definition and implementations and just refactor them quickly. I would expect LLMs for lean to be similar in the future.
Interesting thoughts, thanks. It seems to me a model trained to generate Lean could also be purposed to explain a large Lean proof, and that’s very interesting. So much of modern math is limited to extremely small cohorts.

None of that is dispositive inre provable safety, obviously.

LLMs as they are I postulate would not work well for this. But, purpose built stochastic auto complete with a type checker to reject the junk? That could be actually useful. Funnily enough it's also a domain of application that wouldn't make any money at all. It would have to be an offline LLM that is reasonably efficient to execute locally.
Why would it not make any money?