Hacker News new | ask | show | jobs
by turzmo 26 days ago
Not denying that these advances are impressive, but it is important to consider that this is a cherry-picked result. This doesn’t mean that AI can now be expected to do problems of similar or lower difficulty, but that it happened to work well on one problem. What you won’t see is how many others they had to try to get this result.
1 comments

Earlier of their systems have solve other Erdos problems that people had worked on, this one was more monumental and had had a lot more prior effort that didn't solve, but this isn't a one-off.
This is true, but I still think the relevant question is, how many did they try before they found one that yielded to LLMs? The conclusion is very different if they tried 100 open problems and succeeded at one.
Yeah, maybe it's just the Texas Sharpshooter Fallacy basically, but with AI.

And if it isn't, we should find out very soon. If AI has got so good as OpenAI's post implies, then we should soon see a veritable blooming in the production of mathematical results, by lay people no less. No mathematicians needed! OpenAI say that their secret LLM solved the planar unit distance problem "autonomously" and the companion remarks say it one-shotted it; and while the companion remarks make it clear that there was a lot of refinement and improvement work done by humans, everyone seems to agree that the AI did the job by itself.

If that's true, if we're really at that level of autonomous mathematical reasoning ability, then we should see hundreds, even thousands, of open problems suddenly solved in a matter of years if not months. We'll just have to wait and see.

Yes, as some of these are being solved by the same person, I think my point is even more relevant: you try 1000 problems and solve a few, and only report the few, and it just seems like a matter of time until the rest are solved. But if you report that it didn’t work on the others, your conclusion is different.

I think it is important to temper expectations in light of the fact that these announcements are coming from a startup company with shady values looking to imminently IPO, and thus represent the most biased and misleading take of the situation possible.

Is that 326 solved? As per my comment above?

>> If that's true, if we're really at that level of autonomous mathematical reasoning ability, then we should see hundreds, even thousands, of open problems suddenly solved in a matter of years if not months.

Stressing "hundreds, even thousands".

No, problem #326, you didn't give a few days timeline.

Google released a paper about solving 9 more Erdos problems for an average of $100 each:

https://arxiv.org/html/2605.22763v1

In a year I think we'll probably have seen hundreds of open problems solved, even if there is some a low hanging fruit exhausiton bottleneck.