Hacker News new | ask | show | jobs
by benlivengood 1519 days ago
I think for short proofs that fit in a GPT-style NLP predictor's input, it could produce an initial list of likely "next steps" which could be further scored by the adversarially-trained scoring model which has some intuition for which proof search strategies are likely to work, the combination allowing each model to have different areas of expertise (NLP being technically competent locally and the scorer providing global judgement). The entire proof search would correspond to a single play in AlphaGo with proof search happening entirely within the monte-carlo exploration of likely next-steps as opposed to a sequence of plays as in Go, and play only happening when an automatic proof-checker validates the final proof. Automatic proof-checking could also prune syntactically invalid continuations from the monte-carlo search tree, and probably provide some similar optimizations.

My guess is that such a model would learn to compose a collection of lemmas that score as useful, produce sub-proofs for those, and then combine them into the final proof. This still mimics a sequence of plays closely enough that the scoring model could recognize "progress" toward the goal and predict complementary lemmas until a final solution is visible.

It may even work for very long proofs by letting it "play" a portion of its early lemmas which remain visible to the scorer but the proof sequence is chunked into as many pieces as the NLP needs to see it all and backtracking behind what was output is no longer possible. Once a lemma and its proof are complete it can be used or ignored in the future proof but modifying it sounds less useful.

1 comments

I think the probability of a GPT-style approach learning to generate mathematically interesting next steps is close to zero. The structure GPT captures is a sense of plausibility of a next step and while you could argue that it's possible given enough data to train the network to only rate mathematically interesting steps highly, the rate of that training would be extremely slow. GPT-style training is most effective for language where there is a certain degree of flexibility. It tends to learn plausibility and provides relevance mostly from context. This approach seems akin to searching in the language space of English+Mathematics over all proofs and hoping to learn mathematically relevant continuations. I think the search space is way too large and positive feedback is far too rare for the model to converge to anything useful.

I highly doubt you could get such a model to produce a useful collection of lemmas. Even if you could, I don't think a useful collection of lemmas progressing towards a goal has anywhere near the same structure as scoring a position in a game.

Ultimately for any AlphaZero technique to work you need a powerful position evaluation network. Almost all the magic happens in that network's pruning. It's quite remarkable that you can get a search of 80,000 nodes to do better that a search of 35,000,000 nodes. All of that is due to efficient pruning from the network. You haven't convinced me of any plausible approach to getting such a network here.