| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ocfnash 698 days ago
	The computer did find the answers itself. I.e., it found "even integers" for P1, "{1,1}" for P2, and "2" for P6. It then also provided provided a Lean proof in each case.

3 comments

freehorse 697 days ago

It would make a lot of sense for the lean-code-formalisation of the problems done by the researchers fed to the AI to be provided. Not assuming bad intent in not providing them, but it would help understand better the results.

link

nnarek 698 days ago

formal definition of first theorem already contain answer of the problem "{α : ℝ | ∃ k : ℤ, Even k ∧ α = k}" (which mean set of even real numbers).if they say that they have translated first problem into formal definition then it is very interesting how they initially formalized problem without including answer in it

link

Smaug123 698 days ago

(You're talking to one of the people who was part of the project, which is why I took @ocfnash's answer as authoritative: they did not cheat.)

link

Xelynega 698 days ago

If they're talking to the people who are part of the project I'd hope the answer would contain detail and not expect to be taken as authoritative.

link

golol 698 days ago

I would expect that in their data which they train AlphaProof on they have some concept of a "vague problem" whoch could just look like

{Formal description of the set in question} = ?

And then Alphaproof has to find candidate descriptions of this set and prove a theorem that they are equal to the above.

I doubt they would claim to solve the problem if they provided half of the answer.

link

puttycat 697 days ago

> I doubt they would claim to solve the problem if they provided half of the answer.

Stranger things have happened

link

chx 697 days ago

> I doubt they would claim to solve the problem if they provided half of the answer.

This falls under extraordinary claims require extraordinary proof and we have seen nothing of the sort.

link

sebzim4500 697 days ago

To be fair, that isn't half the answer it's like 99% of the answer.

They clarified above that it provided the full answer though.

link

JyB 697 days ago

The deepmind team has a history of being misleading. The great StarCraft 2 strategist bot is still in mind.

link

totoglazer 697 days ago

What’s the story with that bot? Always thought it was cool. Was that all smoke and mirrors?

link

topato 697 days ago

I think maybe parent comment is referring to it essentially just employing a zerg rush but with the speed and reaction time of an AI? Not 100% sure... Unrelated, iirc the starcraft functionality was an early example of generalizing a pretrained NN, alphaGO, and showing that it could adapt to learn and defeat games across strategic domains, especially after it learned so much strategy from the most difficult, widely played, and most strategically-varied physical game available.

link

pishpash 698 days ago

Exactly, a problem and its answer are just different ways of describing the same object. Every step of a proof is a transformation/translation of the same object. It would be disingenuous to say that some heavy lifting isn't done in formalizing a problem but it seems that step is also performed by a machine:

"We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty."

I'm confused, is the formalization by Gemini or "manually"? Which is it?

link

cygaril 698 days ago

Come up with many possible answers, formalize them all, and then try to prove or disprove each of them.

link

Davidzheng 697 days ago

This is probably partially what they did idk why it's downvoted lol

link

riku_iki 698 days ago

its not clear if theorem is actual input formal definition, or formal definition was in different form.

link

Davidzheng 698 days ago

Can you elaborate on how it makes guesses like this? Does it do experiments before? Is it raw LLM? Is it feedback loop based on partial progress?

link

Sharlin 698 days ago

"AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go."

link

taneq 697 days ago

Huh, so MCTS to find the ‘best’ token using a (relatively) small, quick language model? Sounds like an interesting approach to small model text generation too…

link

JKCalhoun 698 days ago

Yeah I am not clear the degree to which this system and LLMs are related. Are they related? Or is AlphaProof a complete tangent to CHatGPT and its ilk?

link

gowld 698 days ago

It's not an English LLM (Large Language Model).

It's a math Language Model. Not even sure it's a Large Language Model. (Maybe shares a foundational model with an English LLM; I don't know)

It learns mathematical statements, and generates new mathematical statements, then uses search techniques to continue. Similar to Alpha Go's neural network, what makes it new and interesting is how the NN/LLM part makes smart guesses that drastically prune the search tree, before the brute-force search part.

(This is also what humans do to solve math probrems. But humans are really, really slow at brute-force search, so we really almost entirely on the NN pattern-matching analogy-making part.)

link

nextos 697 days ago

These kind of LLMs are also very interesting for software engineering. It's just a matter of replacing Lean with something that is more oriented towards proving software properties.

For example, write a formal specification of a function in Dafny on Liquid Haskell and get the LLM to produce code that is formally guaranteed to be correct. Logic-based + probability-based ML.

All GOFAI ideas are still very useful.

link

lucioperca 697 days ago

You can also verify software like compilers in Lean:

https://aws.amazon.com/blogs/opensource/lean-into-verified-s...

link

sebzim4500 698 days ago

My reading of it is that it uses the same architecture as one of the Gemini models but does not share any weights with it. (i.e it's not just a finetune)

link

_heimdall 697 days ago

This is really interesting. I would have expected the understanding to be that humans make a guess, test it, and learn from what did or did not work. The lessons learned from the prior tests would impact future guesses.

Do you know if a system like the OP is learning from failed tests to guide future tests, or is it a truly a brute force search as if it were trying to mine bitcoin?

link

Thorrez 697 days ago

This quote from the article sounds like it learns from failed tests:

>We trained AlphaProof for the IMO by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition. The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

link

bbor 697 days ago

yeah but it doesn't understand the exact syntax on an absolute level, does it...? I understood this to be the same as any language model applied to programming languages (aka Formal Languages). Is that mistaken?

link

Zondartul 697 days ago

As far as I understand, and I may be wrong here, the system is composed of two networks: Gemini and AlphaZero. Gemini, being an ordinary LLM with some fine-tunes, only does translation from natural to formal language. Then, AlphaZero solves the problem. AlphaZero, unburdened with natural language and only dealing with "playing a game in the proof space" (where the "moves" are commands to the Lean theorem prover), does not hallucinate in the same way an LLM does because it is nothing like an LLM.

link

danielheath 697 days ago

Yes, but the problem space means that invalid outputs can be quickly identified - whereas general programming isn’t necessarily amenable to rapid checks.

link