| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mdemare 281 days ago

Just using common sense, if we had a genius, who had tremendous reasoning ability, total recall of memories, and an unlimited lifespan and patience, and he'd read what the current LLMs have read, we'd expect quite a bit more from him than what we're getting now from LLMs.

There are teenagers that win gold medals on the math olympiad - they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on. A difference of eight orders of magnitude.

In other words, data scarcity is not a fundamental problem, just a problem for the current paradigm.

5 comments

bob1029 280 days ago

I think quantization is the simplest canary.

If we can reduce the precision of the model parameters by 2~32x without much perceptible drop in performance, we are clearly dealing with something wildly inefficient.

I'm open to the possibility that over parameterization is essential as part of the training process, much like how MSAA/SSAA over sample the frame buffer to reduce information aliasing in the final scaled result (also wildly inefficient but very effective generally). However, I think for more exotic architectures (spiking / time domain) these rules don't work the same way. You can't back propagate a recurrent SNN so much of the prevailing machine learning mindset doesn't even apply.

jebarker 280 days ago

It’s not clear that the inefficiency of the current paradigm is in the neural net architectures. It seems just as likely that it’s in the training objective.

qcnguy 280 days ago

Right. The objective is "correctly predict the entire training set", where that training set contains literally everything. So the objective becomes to speak every human language, every programming language, to understand every topic, to master every weird sub-genre of culture. That's an inherently very inefficient training objective if you just want an AI that can do some specific tasks. It's the whole insight behind models specific to summarization, text extraction, patch merging etc.

And don't forget the noise. If you look at the Anthropic papers it's clear from the examples they give that the dataset is still incredibly noisy even after extensive cleaning efforts. A lot of those parameters are being wasted trying to predict garbage outputs from HTML scraping gone wrong.

flooo 281 days ago

Now consider that the genius cannot physically interact with the world or the people therein, and uses her eyes only for reading text.

nosianu 281 days ago

Yes - we train only on a subset of human communication, the one using written symbols (even voice has much much more depth to it), but human brains train on the actual physical world.

Human students who only learned some new words but have not (yet) even began to really comprehend a subject will just throw around random words and sentences that sound great but have no basis in reality too.

For the same sentence, for example, "We need to open a new factory in country XY", the internal model lighting up inside the brain of someone who has actually participated when this was done previously will be much deeper and larger than that of someone who only heard about it in their course work. That same depth is zero for an LLM, which only knows the relations between words and has no representation of the world. Words alone cannot even begin to represent what the model created from the real-world sensors' data, which on top of the direct input is also based on many times compounded and already-internalized prior models (nobody establishes that new factory as a newly born baby with a fresh neural net, actually, even the newly born has inherited instincts that are all based on accumulated real world experiences, including the complex very structure of the brain).

Somewhat similarly, situations reported in comments like this one (client or manager vastly underestimating the effort required to do something): https://news.ycombinator.com/item?id=45123810 The internal model for a task of those far removed from actually doing it is very small compared to the internal models of those doing the work, so trying to gauge required effort falls short spectacularly if they also don't have the awareness.

imtringued 280 days ago

Also the geniuses get beaten with a stick if they don't memorize and perfectly reproduce the text they've read.

Fargren 280 days ago

I'm not sure what point you are trying to make. Are you saying in order to make LLMs better at learning the missing piece is to make the capable to interact with the outside world? Give them actuators and sensors?

energy123 280 days ago

> they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on.

Somewhat apples and oranges given billions of years of evolution behind that human. GPT-5 started off as a blank slate.

TheDong 280 days ago

This comparison is absolute nonsense.

"How could a telescope see saturn, human eyes have billions of years of evolution behind them, and we only made telescopes a few hundred years ago, so they should be much weaker than eyes"

"How can StockFish play chess better than a human, the human brain has had billions of years of evolution"

Evolution is random, slow, and does not mean we arrive at even a local optima.

Vinnl 280 days ago

They're not saying that LLMs should be better than smart teenagers; they're saying that smart teenagers can solve some problems without needing massive amounts of data, so apparently those problems are technically solvable without those amounts of data.

mdemare 280 days ago

Yes. It is astonishing that LLMs can solve problems that only a handful of very smart teenagers can solve, but LLMs do it by consuming a million times as much content as those teenagers. Running out of data is not a reason for despair.

Also consider that during training LLMs spend much less time on processing, say, TAOCP (Knuth), or SICP (Abelson, Sussman, and Sussman), or Probability Theory (Jaynes) than on the entirety that is r/Frugal.

20 thick books turn a smart teenager into a graduate with a MSc. That's what, 10 million tokens?

When we read difficult, important texts, we reflect on them, make exercises, discuss them, etc. We don't know how to make an LLM do that in a way that improves it. Yet.

energy123 280 days ago

What comparison? I was arguing against a comparison.

txrx0000 280 days ago

To be fair, GPT-5 didn't start off as a blank slate. The architecture probably encodes a lot, much like how DNA encodes a lot. The former requires human writing to decompress into a human-like thing, the latter requires the Earth environment and a woman to decompress into a human organism.

But it's indeed apples and oranges. There's no good way to estimate the information encoded by the GPT architecture compared to human DNA. We just have to be empirical and look at what the thing can do.

voxic11 280 days ago

Maybe human brains are constantly generating (and training on) massive amounts of synthetic data and that is how they get so smart?

timeinput 280 days ago

You mean those like 8 hours of ~~nightmares~~ dreams I have every night?

garspin 280 days ago

I doubt it. Brains run at only a few operations per second.... GPUS at TFLOPS. There just isn't enough bandwidth.

My brain only needs to get mugged in a dark alley by a guy in a hoodie once to learn something.

anthonypasq 280 days ago

This sentence really struck me in a particular way. Very interesting. It does seem like thoughts/stream of consciousness is just your brain generating random tokens to itself and learning from it lol.

jimbokun 280 days ago

What experiment could be run to test this hypothesis?

petralithic 280 days ago

Humans are not tabulae rasae though. Evolution has hardwired our geniosity over millions of years.