Hacker News new | ask | show | jobs
by Davidzheng 698 days ago
Can you elaborate on how it makes guesses like this? Does it do experiments before? Is it raw LLM? Is it feedback loop based on partial progress?
1 comments

"AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go."
Huh, so MCTS to find the ‘best’ token using a (relatively) small, quick language model? Sounds like an interesting approach to small model text generation too…
Yeah I am not clear the degree to which this system and LLMs are related. Are they related? Or is AlphaProof a complete tangent to CHatGPT and its ilk?
It's not an English LLM (Large Language Model).

It's a math Language Model. Not even sure it's a Large Language Model. (Maybe shares a foundational model with an English LLM; I don't know)

It learns mathematical statements, and generates new mathematical statements, then uses search techniques to continue. Similar to Alpha Go's neural network, what makes it new and interesting is how the NN/LLM part makes smart guesses that drastically prune the search tree, before the brute-force search part.

(This is also what humans do to solve math probrems. But humans are really, really slow at brute-force search, so we really almost entirely on the NN pattern-matching analogy-making part.)

These kind of LLMs are also very interesting for software engineering. It's just a matter of replacing Lean with something that is more oriented towards proving software properties.

For example, write a formal specification of a function in Dafny on Liquid Haskell and get the LLM to produce code that is formally guaranteed to be correct. Logic-based + probability-based ML.

All GOFAI ideas are still very useful.

You can also verify software like compilers in Lean:

https://aws.amazon.com/blogs/opensource/lean-into-verified-s...

Sure, but Lean has very little support for software problems compared to Isabelle, Coq or Dafny right now.

Those 3 also have a lot of training data as well. Hoping Lean gets more support as it is very friendly.

As a basic learning resource focused on software engineering, there's [1]. But nothing more advanced I am aware of.

[1] The Hitchhiker's Guide to Logical Verification. https://cs.brown.edu/courses/cs1951x/static_files/main.pdf

My reading of it is that it uses the same architecture as one of the Gemini models but does not share any weights with it. (i.e it's not just a finetune)
This is really interesting. I would have expected the understanding to be that humans make a guess, test it, and learn from what did or did not work. The lessons learned from the prior tests would impact future guesses.

Do you know if a system like the OP is learning from failed tests to guide future tests, or is it a truly a brute force search as if it were trying to mine bitcoin?

This quote from the article sounds like it learns from failed tests:

>We trained AlphaProof for the IMO by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition. The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

Reading between the lines a bit, that does answer the question I had though don't think I I clarified very well.

I read that to say the model's token weights are adjusted as it goes, so in an LLM sense it is kind of learning. It isn't reasoning through an answer in the way a human does though. Meaning, the model is still just statistically predicting what an answer may be and checking if it worked.

I wouldn't chalk that up to learning at all. An AI solving complex math doesn't even seem too impressive to me with the predictive loop approach. Computers are well adept at math, throwing enough compute hardware at it to brute force an answer isn't suprising. I'd be really impressed if it could reliably get there with a similar number of failed attempts as a human, that could indicate that it really learned and reasoned rather than rammed through a mountain of failed guesses.

yeah but it doesn't understand the exact syntax on an absolute level, does it...? I understood this to be the same as any language model applied to programming languages (aka Formal Languages). Is that mistaken?
As far as I understand, and I may be wrong here, the system is composed of two networks: Gemini and AlphaZero. Gemini, being an ordinary LLM with some fine-tunes, only does translation from natural to formal language. Then, AlphaZero solves the problem. AlphaZero, unburdened with natural language and only dealing with "playing a game in the proof space" (where the "moves" are commands to the Lean theorem prover), does not hallucinate in the same way an LLM does because it is nothing like an LLM.
Yes, but the problem space means that invalid outputs can be quickly identified - whereas general programming isn’t necessarily amenable to rapid checks.
I mean, aren’t you just describing formal language syntax? Seems like a fundamentally similar situation —- the computer can automatically flag any syntax errors in a millisecond by checking it against the generating grammar for that language. Thats what makes a formal language in the first place, I think!

I do think this language is considerably more robust than the typical programming language, which means a sound program is more likely to end up being valid/“correct”. But still, that’s a difference of degree, not kind, IMO