Hacker News new | ask | show | jobs
by energy123 290 days ago
> they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on.

Somewhat apples and oranges given billions of years of evolution behind that human. GPT-5 started off as a blank slate.

2 comments

This comparison is absolute nonsense.

"How could a telescope see saturn, human eyes have billions of years of evolution behind them, and we only made telescopes a few hundred years ago, so they should be much weaker than eyes"

"How can StockFish play chess better than a human, the human brain has had billions of years of evolution"

Evolution is random, slow, and does not mean we arrive at even a local optima.

They're not saying that LLMs should be better than smart teenagers; they're saying that smart teenagers can solve some problems without needing massive amounts of data, so apparently those problems are technically solvable without those amounts of data.
Yes. It is astonishing that LLMs can solve problems that only a handful of very smart teenagers can solve, but LLMs do it by consuming a million times as much content as those teenagers. Running out of data is not a reason for despair.

Also consider that during training LLMs spend much less time on processing, say, TAOCP (Knuth), or SICP (Abelson, Sussman, and Sussman), or Probability Theory (Jaynes) than on the entirety that is r/Frugal.

20 thick books turn a smart teenager into a graduate with a MSc. That's what, 10 million tokens?

When we read difficult, important texts, we reflect on them, make exercises, discuss them, etc. We don't know how to make an LLM do that in a way that improves it. Yet.

What comparison? I was arguing against a comparison.
To be fair, GPT-5 didn't start off as a blank slate. The architecture probably encodes a lot, much like how DNA encodes a lot. The former requires human writing to decompress into a human-like thing, the latter requires the Earth environment and a woman to decompress into a human organism.

But it's indeed apples and oranges. There's no good way to estimate the information encoded by the GPT architecture compared to human DNA. We just have to be empirical and look at what the thing can do.