| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xeeeeeeeeeeenu 194 days ago

> no prior solutions found.

This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].

[1] - https://www.erdosproblems.com/forum/thread/281#post-3325

[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...

4 comments

nl 194 days ago

Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"

And even odder that the proof was by Erdos himself and yet he listed it as an open problem!

link

pfdietz 191 days ago

The theorem is implied by an older result of Erdos, but is not a result of Erdos. Apparently this is because the connection is something called "Roger's Theorem" that was quite obscure.

https://terrytao.wordpress.com/2026/01/19/rogers-theorem-on-...

"This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!"

link

TZubiri 194 days ago

Maybe it was in the training set.

link

magneticnorth 194 days ago

I think that was Tao's point, that the new proof was not just read out of the training set.

link

rzmmm 194 days ago

The model has multiple layers of mechanisms to prevent carbon copy output of the training data.

link

TZubiri 194 days ago

forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"

link

ffsm8 194 days ago

It's mind boggling if you think about the fact they're essential "just" statistical models

It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth

link

mikaraento 194 days ago

That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

link

ComplexSystems 193 days ago

The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.

link

efskap 194 days ago

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted

link

glemion43 194 days ago

Do you have a source for this?

Carbon copy would mean over fitting

link

fweimer 193 days ago

I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history.

It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.

link

NewsaHackO 193 days ago

It is the classic "He made it up"

link

Der_Einzige 193 days ago

Source is just read the definition of what "temperature" is.

But honestly source = "a knuckle sandwich" would be appropriate here.

link

Den_VR 194 days ago

Unfortunately.

link

GeoAtreides 193 days ago

does it?

this is a verbatim quote from gemini 3 pro from a chat couple of days ago:

"Because I have done this exact project on a hot water tank, I can tell you exactly [...]"

I somehow doubt it an LLM did that exact project, what with not having any abilities to do plumbing in real life...

link

retsibsi 193 days ago

Isn't that easily explicable as hallucination, rather than regurgitation?

link

cma 193 days ago

I don't think it is dispositive, just that it likely didn't copy the proof we know was in the training set.

A) It is still possible a proof from someone else with a similar method was in the training set.

B) something similar to erdos's proof was in the training set for a different problem and had a similar alternate solution to chatgpt, and was also in the training set, which would be more impressive than A)

link

CamperBob2 193 days ago

It is still possible a proof from someone else with a similar method was in the training set.

A proof that Terence Tao and his colleagues have never heard of? If he says the LLM solved the problem with a novel approach, different from what the existing literature describes, I'm certainly not able to argue with him.

link

mmooss 193 days ago

> A proof that Terence Tao and his colleagues have never heard of?

Tao et al. didn't know of the literature proof that started this subthread.

link

heliumtera 193 days ago

Does it matter if it copied or not? How the hell would one even define if it is a copy or original at this point?

At this point the only conclusion here is: The original proof was on the training set. The author and Terence did not care enough to find the publication by erdos himself

link

davidhs 193 days ago

It looks like these models work pretty well as natural language search engines and at connecting together dots of disparate things humans haven't done.

link

pfdietz 193 days ago

They're finding them very effective at literature search, and at autoformalization of human-written proofs.

Pretty soon, this is going to mean the entire historical math literature will be formalized (or, in some cases, found to be in error). Consider the implications of that for training theorem provers.

link

mlpoknbji 193 days ago

I think "pretty soon" is a serious overstatement. This does not take into account the difficulty in formalizing definitions and theorem statements. This cannot be done autonomously (or, it can, but there will be serious errors) since there is no way to formalize the "text to lean" process.

What's more, there's almost surely going to turn out to be a large amount of human generated mathematics that's "basically" correct, in the sense that there exists a formal proof that morally fits the arc of the human proof, but there's informal/vague reasoning used (e.g. diagram arguments, etc) that are hard to really formalize, but an expert can use consistently without making a mistake. This will take a long time to formalize, and I expect will require a large amount of human and AI effort.

link

pfdietz 193 days ago

It's all up for debate, but personally I feel you're being too pessimistic there. The advances being made are faster than I had expected. The area is one where success will build upon and accelerate success, so I expect the rate of advance to increase and continue increasing.

This particular field seems ideal for AI, since verification enables identification of failure at all levels. If the definitions are wrong the theorems won't work and applications elsewhere won't work.

link

p-e-w 193 days ago

Every time this topic comes up people compare the LLM to a search engine of some kind.

But as far as we know, the proof it wrote is original. Tao himself noted that it’s very different from the other proof (which was only found now).

That’s so far removed from a “search engine” that the term is essentially nonsense in this context.

link

theptip 193 days ago

Hassabis put forth a nice taxonomy of innovation: interpolation, extrapolation, and paradigm shifts.

AI is currently great at interpolation, and in some fields (like biology) there seems to be low-hanging fruit for this kind of connect-the-dots exercise. A human would still be considered smart for connecting these dots IMO.

AI clearly struggles with extrapolation, at least if the new datum is fully outside the training set.

And we will have AGI (if not ASI) if/when AI systems can reliably form new paradigms. It’s a high bar.

link

davidhs 191 days ago

Maybe if Terence Tao had memorized the entire Internet (and pretty much all media), then maybe he would find bits and pieces of the problem remind him of certain known solutions and be able to connect the dots himself.

But, I don't know. I tend to view these (reasoning) LLMs as alien minds and my intuition of what is perhaps happening under the hood is not good.

I just know that people have been using these LLMs as search engines (including Stephen Wolfram), browsing through what these LLMs perhaps know and have connected together.

link

cubefox 194 days ago

This illustrates how unimportant this problem is. A prior solution did exist, but apparently nobody knew because people didn't really care about it. If progress can be had by simply searching for old solutions in the literature, then that's good evidence the supposed progress is imaginary. And this is not the first time this has happened with an Erdős problem.

A lot of pure mathematics seems to consist in solving neat logic puzzles without any intrinsic importance. Recreational puzzles for very intelligent people. Or LLMs.

link

glemion43 194 days ago

It shows that a 'llm' can now work on issues like this today and tomorrow it can do even more.

Don't be so ignorant. A few years ago NO ONE could have come up with something so generic as an LLM which will help you to solve this kind of problems and also create text adventures and java code.

link

danielbln 194 days ago

The goal posts are strapped to skateboards these days, and the WD40 is applied to the wheels generously.

link

sampullman 193 days ago

Regular WD40 should not be used as bearing lubricant!

link

danielbln 193 days ago

Exactly!

link

glemion43 193 days ago

I don't get your pessimism...

Nothing of it was even imaginable and yes the progress is crazy fast.

How can you be so dismissive?

link

danielbln 193 days ago

You misread my comment.

link

glemion43 193 days ago

You mean like a small rocket build? Okay :)

link

BoredPositron 194 days ago

You can just wait and verify instead of the publishing, redacting cycles of the last year. It's embarrassing.

link

jojobas 193 days ago

It's hard to predict which maths result from 100 years ago surfaces in say quantum mechanics or cryptography.

link

layer8 193 days ago

The likelihood for that is vanishingly low, though, for any given math result.

link

antonvs 193 days ago

> "intrinsic importance"

"Intrinsic" in contexts like this is a word for people who are projecting what they consider important onto the world. You can't define it in any meaningful way that's not entirely subjective.

link

cubefox 193 days ago

Mathematical theorems at least have objectively lower information content, because they merely rule out the impossible, while scientific knowledge also rules out the possible but non-actual.

link

antonvs 192 days ago

You have it backwards. Mathematical theorems have objectively higher information content, because they rule out the impossible and model possibilities in all possible worlds that satisfy their preconditions. Scientific knowledge can never do more than inductive projections from observations in the single world we have physical access to.

The only thing that saves science from being nothing more than “huh, will you look at that,” is when it can make use of a mathematical model to provide insight into relationships between phenomena.

link

MattGaiser 194 days ago

There is still enormous value in cleaning up the long tail of somewhat important stuff. One of the great benefits of Claude Code to me is that smaller issues no longer rot in backlogs, but can be at least attempted immediately.

link

cubefox 194 days ago

The difference is that Claude Code actually solves practical problems, but pure (as opposed to applied) mathematics doesn't. Moreover, a lot of pure mathematics seems to be not just useless, but also without intrinsic epistemic value, unlike science. See https://news.ycombinator.com/item?id=46510353

link

drob518 193 days ago

I’m an engineer, not a mathematician, so I definitely appreciate applied math more than I do abstract math. That said, that’s my personal preference and one of the reasons that I became an engineer and not a mathematician. Working on nothing but theory would bore me to tears. But I appreciate that other people really love that and can approach pure math and see the beauty. And thank God that those people exist because they sometimes find amazing things that we engineers can use during the next turn of the technological crank. Instead of seeing pure math as useless, perhaps shift to seeing it as something wonderful for which we have not YET found a practical use.

link

Ar-Curunir 193 days ago

Even if pure math is useless, that’s still okay. We do plenty of things that are useless. Not everything has to have a use.

link

drob518 193 days ago

I’m not sure I agree. Pure math is not useless because a lot of math is very useful. But we don’t know ahead of time what is going to be useless vs. useful. We need to do all of it and then sort it out later.

If we knew that it was all going to be useless, however, then it’s a hobby for someone, not something we should be paying people to do. Sure, if you enjoy doing something useless, knock yourself out… but on your own dime.

link

jstanley 194 days ago

Applications for pure mathematics can't necessarily be known until the underlying mathematics is solved.

Just because we can't imagine applications today doesn't mean there won't be applications in the future which depend on discoveries that are made today.

link

cubefox 193 days ago

Well, read the linked comment. The possible future applications of useless science can't be known either. I still argue that it has intrinsic value apart from that, unlike pure mathematics.

link

Thorrez 193 days ago

There are many cases where pure mathematics became useful later.

https://www.reddit.com/r/math/comments/dfw3by/is_there_any_e...

link

teiferer 194 days ago

It's hard to know beforehand. Like with most foundational research.

My favorite example is number theory. Before cyptography came along it was pure math, an esoteric branch for just number nerds. defund Turns out, super applicable later on.

link

baq 194 days ago

You’re confusing immediately useful with eventually useful. Pure maths has found very practical applications over the millennia - unless you don’t consider it pure anymore, at which point you’re just moving goalposts.

link

cubefox 194 days ago

No, I'm not confusing that. Read the linked comment if you're interested.

link

TheOtherHobbes 194 days ago

You are confusing that. The biggest advancements in science are the result of the application of leading-edge pure math concepts to physical problems. Netwonian physics, relativistic physics, quantum field theory, Boolean computing, Turing notions of devices for computability, elliptic-curve cryptography, and electromagnetic theory all derived from the practical application of what was originally abstract math play.

Among others.

Of course you never know which math concept will turn out to be physically useful, but clearly enough do that it's worth buying conceptual lottery tickets with the rest.

link

amazingman 194 days ago

It's unclear to me what point you are making.

link

threethirtytwo 194 days ago

[flagged]

link

eru 194 days ago

> This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.

> Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"

link

catoc 194 days ago

I firmly believe @threethirtytwo’s reply was not produced by an LLM

link

mkarliner 194 days ago

regardless of if this text was written by an LLM or a human, it is still slop,with a human behind it just trying to wind people up . If there is a valid point to be made , it should be made, briefly.

link

catoc 194 days ago

If the point was triggering a reply, the length and sarcasm certainly worked.

I agree brevity is always preferred. Making a good point while keeping it brief is much harder than rambling on.

But length is just a measure, quality determines if I keep reading. If a comment is too long, I won’t finish reading it. If I kept reading, it wasn’t too long.

link

johnfn 194 days ago

I suspect this is AI generated, but it’s quite high quality, and doesn’t have any of the telltale signs that most AI generated content does. How did you generate this? It’s great.

link

AstroBen 194 days ago

Their comments are full of "it's not x, it's y" over and over. Short pithy sentences. I'm quite confident it's AI written, maybe with a more detailed prompt than the average

I guess this is the end of the human internet

link

prussia 194 days ago

To give them the benefit of the doubt, people who talk to AI too much probably start mimicking its style.

link

4k93n2 194 days ago

yea, i was suspicious by the second paragraph but was sure once i got to "that’s not engineering, it’s cosplay"

link

AstroBen 194 days ago

It's also the wording. The weird phrases

"Glorified Google search with worse footnotes" what on earth does that mean?

AI has a distinct feel to it

link

lxgr 194 days ago

And with enough motivated reasoning, you can find AI vibes in almost every comment you don’t agree with.

For better or worse, I think we might have to settle on “human-written until proven otherwise”, if we don’t want to throw “assume positive intent” out the window entirely on this site.

link

testdelacc1 194 days ago

Dude is swearing up and down that they came up with the text on their own. I agree with you though, it reeks of LLMs. The only alternative explanation is that they use LLMs so much that they’ve copied the writing style.

link

plaguuuuuu 194 days ago

I've had that exact phrase pop up from an LLM when I asked it for a more negative code review

link

threethirtytwo 194 days ago

Your intuition on AI is out of date by about 6 months. Those telltale signs no longer exist.

It wasn't AI generated. But if it was, there is currently no way for anyone to tell the difference.

link

catlifeonmars 194 days ago

I’m confused by this. I still see this kind of phrasing in LLM generated content, even as recent as last week (using Gemini, if that matters). Are you saying that LLMs do not generate text like this, or that it’s now possible to get text that doesn’t contain the telltale “its not X, it’s Y”?

link

comp_throw7 194 days ago

> But if it was there is currently no way for anyone to tell the difference.

This is false. There are many human-legible signs, and there do exist fairly reliable AI detection services (like Pangram).

link

int_19h 192 days ago

There are no reliable AI detection services. At best they can reliably detect output from popular chatbots running with their default prompts. Beyond that reliability deteriorates rapidly so they either err on the side of many false positives, or on the side of many false negatives.

There's already been several scandals where students were accused of AI use on the basis of these services and successfully fought back.

link

threethirtytwo 194 days ago

I've tested some of those services and they weren't very reliable.

link

CamperBob2 193 days ago

If such a thing did exist, it would exist only until people started training models to hide from it.

Negative feedback is the original "all you need."

link

velox_neb 194 days ago

> It wasn't AI generated.

You're lying: https://www.pangram.com/history/94678f26-4898-496f-9559-8c4c...

Not that I needed pangram to tell me that, it's obvious slop.

link

threethirtytwo 194 days ago

I wouldn't know how to prove to you otherwise other then to tell you that I have seen these tools show incorrect results for both AI generated text and human written text.

link

lxgr 194 days ago

Good thing you had a stochastic model backing up (with “low confidence”, no less) your vague intuition of a comment you didn’t like being AI-written.

link

XenophileJKO 194 days ago

I must be a bot because I love existential dread, that's a great phrase. I feel like they trigger a lot on literate prose.

link

lxgr 194 days ago

Sad times when the only remaining way to convince LLM luddites of somebody’s humanity is bad writing.

link

CamperBob2 194 days ago

(edit: removed duplicate comment from above, not sure how that happened)

link

undeveloper 194 days ago

the poster is in fact being very sarcastic. arguing in favor of emergent reasoning does in fact make sense

link

threethirtytwo 194 days ago

It's a formal sarcasm piece.

link

CamperBob2 194 days ago

It's bizarre. The same account was previously arguing in favor of emergent reasoning abilities in another thread ( https://news.ycombinator.com/item?id=46453084 ) -- I voted it up, in fact! Turing test failed, I guess.

(edit: fixed link)

link

threethirtytwo 194 days ago

I thought the mockery and sarcasm in my piece was rather obvious.

link

CamperBob2 194 days ago

Poe's Law is the real Bitter Lesson.

link

habinero 194 days ago

We need a name for the much more trivial version of the Turing test that replaces "human" with "weird dude with rambling ideas he clearly thinks are very deep"

I'm pretty sure it's like "can it run DOOM" and someone could make an LLM that passes this that runs on an pregnancy test

link

magnio 194 days ago

Pity that HN's ability to detect sarcasm is as robust as that of a sentiment analysis model using keyword-matching.

link

furyofantares 194 days ago

The problem is more that it's an LLM-generated comment that's about 20x as long as it needed to be to get the point across.

link

cubefox 194 days ago

It's obviously not LLM-generated.

link

kleene_op 194 days ago

Phew. This is a relief, honestly!

link

threethirtytwo 194 days ago

It's not.

Evidence shows otherwise: Despite the "20x" length, many people actually missed the point.

link

eru 194 days ago

Despite or because?

link

furyofantares 193 days ago

Oh yeah, there is also a problem with people not noticing they're reading LLM output, AND with people missing sarcasm on here. Actually, I'm OK with people missing sarcasm on here - I have plenty of places to go for sarcasm and wit and it's actually kind of nice to have a place where most posts are sincere, even if that sets people up to miss it when posts are sarcastic.

Which is also what makes it problematic that you're lying about your LLM use. I would honestly love to know your prompt and how you iterated on the post, how much you put into it and how much you edited or iterated. Although pretending there was no LLM involved at all is rather disappointing.

Unfortunately I think you might feel backed into a corner now that you've insisted otherwise but it's a genuinely interesting thing here that I wish you'd elaborate on.

link

_diyar 194 days ago

I definitely missed the point because of the length, and only realized after I read replies to your comment.

link

threethirtytwo 194 days ago

Next time I'll write something shorter, or if you don't believe I wrote it... then I'll tell the AI to write something shorter.

link

quinnjh 194 days ago

Its not just verbose—it's almost a novel. Parent either cooked and capped, or has managed to perfectly emulate the patterns this parrot is stochastically known best for. I liked the pro human vibe if anything.

link

catlifeonmars 194 days ago

That’s just the internet. Detecting sarcasm requires a lot of context external to the content of any text. In person some of that is mitigated by intonation, facial expressions, etc. Typically it also requires that the the reader is a native speaker of the language or at least extremely proficient.

link

dang 193 days ago

I'm more worried that the best LLMs aren't yet good enough to classify satire reliably.

link

nurettin 194 days ago

Why not plan for a future where a lot of non-trivial tasks are automated instead of living on the edge with all this anxiety?

link

rixed 194 days ago

Are you expecting people who can't detect self-dellusions to be able to detect sarcasm, or are you just being cruel?

link