| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TimPC 1520 days ago

I think this approach hinges on how you build and define the network that flags the "most promising search state". I'd argue this is a far harder problem in theorem proving than it is in various games. Before AlphaZero all kinds of position evaluation techniques existed in each of these games and were in widespread use for over 25 years.

Comparatively, it's even a hard problem to define anything resembling a board where the structure is fixed enough to apply something similar to positional evaluation to. What you want is a network that flags next steps in theorems as promising and allows you to effectively ignore most of the search space. If you have a good way to do this I think AlphaZero would be a promising approach.

But even if you get AlphaZero to work it doesn't solve the problem of understanding. AlphaZero made go players better by giving them more content to study but it didn't explicitly teach knowledge of the game. Strategically in Go, it's actually rather unsatisfying because it seems to win most of its games by entirely abandoning influence for territory and then relying on its superior reading skills to pick fights inside the influence where it often seems to win from what pro players would previously describe as a disadvantageous fight. This means the learnings are limited for players who have less good reading skills than AlphaZero as they can't fight nearly as well, which suggests abandoning the entire dominant strategy it uses.

1 comments

benlivengood 1520 days ago

I think for short proofs that fit in a GPT-style NLP predictor's input, it could produce an initial list of likely "next steps" which could be further scored by the adversarially-trained scoring model which has some intuition for which proof search strategies are likely to work, the combination allowing each model to have different areas of expertise (NLP being technically competent locally and the scorer providing global judgement). The entire proof search would correspond to a single play in AlphaGo with proof search happening entirely within the monte-carlo exploration of likely next-steps as opposed to a sequence of plays as in Go, and play only happening when an automatic proof-checker validates the final proof. Automatic proof-checking could also prune syntactically invalid continuations from the monte-carlo search tree, and probably provide some similar optimizations.

My guess is that such a model would learn to compose a collection of lemmas that score as useful, produce sub-proofs for those, and then combine them into the final proof. This still mimics a sequence of plays closely enough that the scoring model could recognize "progress" toward the goal and predict complementary lemmas until a final solution is visible.

It may even work for very long proofs by letting it "play" a portion of its early lemmas which remain visible to the scorer but the proof sequence is chunked into as many pieces as the NLP needs to see it all and backtracking behind what was output is no longer possible. Once a lemma and its proof are complete it can be used or ignored in the future proof but modifying it sounds less useful.

TimPC 1519 days ago

I think the probability of a GPT-style approach learning to generate mathematically interesting next steps is close to zero. The structure GPT captures is a sense of plausibility of a next step and while you could argue that it's possible given enough data to train the network to only rate mathematically interesting steps highly, the rate of that training would be extremely slow. GPT-style training is most effective for language where there is a certain degree of flexibility. It tends to learn plausibility and provides relevance mostly from context. This approach seems akin to searching in the language space of English+Mathematics over all proofs and hoping to learn mathematically relevant continuations. I think the search space is way too large and positive feedback is far too rare for the model to converge to anything useful.

I highly doubt you could get such a model to produce a useful collection of lemmas. Even if you could, I don't think a useful collection of lemmas progressing towards a goal has anywhere near the same structure as scoring a position in a game.

Ultimately for any AlphaZero technique to work you need a powerful position evaluation network. Almost all the magic happens in that network's pruning. It's quite remarkable that you can get a search of 80,000 nodes to do better that a search of 35,000,000 nodes. All of that is due to efficient pruning from the network. You haven't convinced me of any plausible approach to getting such a network here.