This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.
(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)
Very plausible, but that would also be noteworthy. As I've mentioned in some other comments here, (as far as I know) we outside of DeepMind don't know anything about the computing power required to run alphaproof, and the tradeoff between computing power required and the complexity of problems it can address is really key to understanding how useful it might be.
I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.
Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.
Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.
For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)
> Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.
No, in my experience the whole practical question is, can it be found automatically, or can it not be found automatically? Because there is an exponential search space that conventional automated methods will not be able to chew through, it either works, or it doesn't. AlphaProof shows that for some difficult IMO problems, it works.
More or less. Modern theorem provers, even fully automatic ones, can prove incredibly difficult problems if given enough time. With 3 days and terabytes of memory, perhaps they could? Would be interesting to compare Alphaproof with a standard theorem prover that is given similarly astronomical computing resources.
That is an interesting thought. But I doubt that standard automated theorem proving techniques would have solved this in a 100 years, even with immense resources, as all they do is (basically) brute force search, and it remains an exponential search space. Therefore 3 days does not really matter, indeed, 3 months would still be a result that 10 years ago I would have thought impossible.
(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)