| HN Mirror

I think the probability of a GPT-style approach learning to generate mathematically interesting next steps is close to zero. The structure GPT captures is a sense of plausibility of a next step and while you could argue that it's possible given enough data to train the network to only rate mathematically interesting steps highly, the rate of that training would be extremely slow. GPT-style training is most effective for language where there is a certain degree of flexibility. It tends to learn plausibility and provides relevance mostly from context. This approach seems akin to searching in the language space of English+Mathematics over all proofs and hoping to learn mathematically relevant continuations. I think the search space is way too large and positive feedback is far too rare for the model to converge to anything useful.

I highly doubt you could get such a model to produce a useful collection of lemmas. Even if you could, I don't think a useful collection of lemmas progressing towards a goal has anywhere near the same structure as scoring a position in a game.

Ultimately for any AlphaZero technique to work you need a powerful position evaluation network. Almost all the magic happens in that network's pruning. It's quite remarkable that you can get a search of 80,000 nodes to do better that a search of 35,000,000 nodes. All of that is due to efficient pruning from the network. You haven't convinced me of any plausible approach to getting such a network here.