| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kazinator 310 days ago

> conclusions gathered from toy models and implying this generalises to production LLMs is useless

You are just trotting out the tired argument that model size magically fixes the issues, rather than just improves the mirage, and so nothing can be known about models with M parameters by studying models with N < M parameters.

Given enough parameters, a miraculous threshold is reached whereby LLMs switch from interpolating to extrapolating.

Sure!

1 comments

ricardobeat 310 days ago

That’s what has been seen in practice though. SOTA LLMs have been shown again and again to solve problems unseen in their data set; and despite their shortcomings they have become extremely useful for a wide variety of tasks.

link

kazinator 310 days ago

Even a tiny model for, say, classifying hand-written digits, will correctly classify digits that didn't appear in its training data. (Otherwise it wouldn't be very useful.) That classification is interpolative; the hand-written digit is lands in the space of the training data.

Every result is explainable by has having come from training data. That's the null hypothesis.

The alternative hypothesis is that it's not explainable as having come from training data. That's a hard-to-believe, hard-to-prove negative.

You don't get anything out of any computational process that you didn't put in.

link

zwaps 310 days ago

You actually do not classify digits that didn't appear, you classify different pictures of digits that DID appear.

Similarly, LLMs do not invent a new way of reasoning about problems or language. They do, however, apply these to unseen problems.

LLMs are one level of abstraction up, but it's a very interesting level of abstraction.

link

bumby 310 days ago

>you classify different pictures of digits that DID appear.

Are you implying models that classify hand-written digits don’t generalize and only work on training data?

link

kazinator 309 days ago

No, that is false; a neural net trained on a decent set of handwritten digits will recognize a newly handwritten digit.

I'm saying that this is a strawman version of "not in the training data". The newly handwritten digit is squarely the same sort of stuff that is in the training data: an interpolation.

We are not surprised when we fit a curve to a bunch of points and then find points on the curve that are not exactly any of those points, but are located among the points.

Go too far outside of the cluster of points though and the curve is a hallucination.

This is the intuition behind interpolate vs extrapolate.

link

loosetypes 310 days ago

Mind linking any examples (or categories) of problems that are definitively not in pre training data but can still be solved by LLMs? Preferably something factual rather than creative, genuinely curious.

Dumb question but anything like this that’s written about on the internet will ultimately end up as training fodder, no?

link

dcre 310 days ago

How about the International Math Olympiad?

https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...

link

mvieira38 310 days ago

You're saying they don't use math textbooks and math forums to train LLMs, then?

link

dcre 310 days ago

The problems are not in textbooks. I’m curious what would count as an out of distribution problem for you. Only problems no one knows how to solve?

link

Workaccount2 310 days ago

You can apply this same argument to humans, 99.999% of people will not be able to escape it.

In the case of the Math Olympiad, the students who take it grind hours a day for months on practice problems and past Olympiad problems.

link

boxed 310 days ago

> SOTA LLMs have been shown again and again to solve problems unseen in their data set

We have no idea what the training data is though, so you can't say that.

> and despite their shortcomings they have become extremely useful for a wide variety of tasks.

That seems like a separate question.

link

zwaps 310 days ago

I have applied O3 pro on unpublished abandoned research of mine that was never published and lives in an intersection that is as entirely novel as it's uninteresting.

O3 pro (but not O3) was successfully able to apply reasoning and math to this domain in interesting ways, much like an expert researcher in these areas would.

Again, the field and the problem is with 100% certainty OOD of the data.

However, the techniques and reasoning methods are of course learned from data. But that's the point, right?

link

staticman2 309 days ago

The paper is evaluating how well an LLM can handle novelty, and on the paper's terms you need to calculate or otherwise somehow deduce the degree or type of novelty rather than simply describing your never published research as novel.

I don't even know that this is possible without seeing the training data. Hence the difficulty in describing how good at "reasoning" O3 Pro is.

The most novel problem would presumably be something only a martian could understand, written in an alien language, the least novel problem would be a basic question taught in preschool like what color is the sky.

Your research falls somewhere between those extremes.

link

boxed 309 days ago

LLMs don't learn reasoning. At all. They are statistical language models. Nothing else. If they get math right it's because correct math is more statistically probable given the training data, it can't actually do math. This should be pretty clear from all the "how many Rs are there in strawberry" type examples.

link