Hacker News new | ask | show | jobs
by sigmoid10 811 days ago
The data volume is actually not that different once you account for all senses and how many years it takes for a human to become useful. The interesting thing would be how the human brain filters out the unimportant information as it develops.
1 comments

That's a distinction without a difference. The majority of data is from a distribution that's already been sampled multiple times.

E.g. how often does a baby go out and experience something novel? The majority of it's time is spent getting the same stimulus over and over again, as anyone listening to childrens television can attest.

Humans learn in fundamentally different ways to our current systems and information poverty is not a problem for us.

And what do you think epochs in machine learning are? Or why more modern training efforts (i.e. for LLMs) are focussing hard on deduplicating scraped data?
Why don't you tell me instead of asking questions that you surely know the answer for?
It was rhetorical. But in case you actually don't know: what you described (i.e. multi sampling) has been common practice in ML for ages. Only now the latest models are getting so big that people are actually trying hard to move away from this idea because it would take a human lifetime in wall clock time to train a cutting edge LLM on similar datastreams.