|
|
|
|
|
by TimPC
1520 days ago
|
|
I think this approach hinges on how you build and define the network that flags the "most promising search state". I'd argue this is a far harder problem in theorem proving than it is in various games. Before AlphaZero all kinds of position evaluation techniques existed in each of these games and were in widespread use for over 25 years. Comparatively, it's even a hard problem to define anything resembling a board where the structure is fixed enough to apply something similar to positional evaluation to. What you want is a network that flags next steps in theorems as promising and allows you to effectively ignore most of the search space. If you have a good way to do this I think AlphaZero would be a promising approach. But even if you get AlphaZero to work it doesn't solve the problem of understanding. AlphaZero made go players better by giving them more content to study but it didn't explicitly teach knowledge of the game. Strategically in Go, it's actually rather unsatisfying because it seems to win most of its games by entirely abandoning influence for territory and then relying on its superior reading skills to pick fights inside the influence where it often seems to win from what pro players would previously describe as a disadvantageous fight. This means the learnings are limited for players who have less good reading skills than AlphaZero as they can't fight nearly as well, which suggests abandoning the entire dominant strategy it uses. |
|
My guess is that such a model would learn to compose a collection of lemmas that score as useful, produce sub-proofs for those, and then combine them into the final proof. This still mimics a sequence of plays closely enough that the scoring model could recognize "progress" toward the goal and predict complementary lemmas until a final solution is visible.
It may even work for very long proofs by letting it "play" a portion of its early lemmas which remain visible to the scorer but the proof sequence is chunked into as many pieces as the NLP needs to see it all and backtracking behind what was output is no longer possible. Once a lemma and its proof are complete it can be used or ignored in the future proof but modifying it sounds less useful.