Earlier of their systems have solve other Erdos problems that people had worked on, this one was more monumental and had had a lot more prior effort that didn't solve, but this isn't a one-off.
This is true, but I still think the relevant question is, how many did they try before they found one that yielded to LLMs? The conclusion is very different if they tried 100 open problems and succeeded at one.
Yeah, maybe it's just the Texas Sharpshooter Fallacy basically, but with AI.
And if it isn't, we should find out very soon. If AI has got so good as OpenAI's post implies, then we should soon see a veritable blooming in the production of mathematical results, by lay people no less. No mathematicians needed! OpenAI say that their secret LLM solved the planar unit distance problem "autonomously" and the companion remarks say it one-shotted it; and while the companion remarks make it clear that there was a lot of refinement and improvement work done by humans, everyone seems to agree that the AI did the job by itself.
If that's true, if we're really at that level of autonomous mathematical reasoning ability, then we should see hundreds, even thousands, of open problems suddenly solved in a matter of years if not months. We'll just have to wait and see.
Yes, as some of these are being solved by the same person, I think my point is even more relevant: you try 1000 problems and solve a few, and only report the few, and it just seems like a matter of time until the rest are solved. But if you report that it didn’t work on the others, your conclusion is different.
I think it is important to temper expectations in light of the fact that these announcements are coming from a startup company with shady values looking to imminently IPO, and thus represent the most biased and misleading take of the situation possible.
>> If that's true, if we're really at that level of autonomous mathematical reasoning ability, then we should see hundreds, even thousands, of open problems suddenly solved in a matter of years if not months.