| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Volundr 933 days ago
	> Babies learn to talk on way less data than the entire internet. Is this actually true? My gut check says yes, but I'm also unaware of any meaningful way to actually quantify the volume of sensor data processed by a baby (or anyone else for that matter), and it wouldn't shock me to discover if we could we'd find it to be a huge volume.

2 comments

quickthrower2 933 days ago

Ah yes. I should be more precise. Less data that is textual. Of course other data sources are plentiful. Including internal and external sensory.

link

pyuser583 933 days ago

Babies in ancient societies certainly had less exposure to written language, much lower vocabulary, less exposure to music, etc.

link

Volundr 933 days ago

Sure the breadth is (maybe) smaller, but the question is volume. Babies get years of people talking around them, as well as data from their own muscles and vocalizations fed back to them. Is the volume they have consumed to the point the begin talking actually less than the volume consumed by an LLM?

link

pyuser583 932 days ago

If you’re taking about babies in ancient societies (which I am), the answer is absolutely yes. They were exposed to much less language, and much less sound, than we are.

link

Volundr 932 days ago

Really? How much less? I'm far from convinced that if you sum up the sheer volume of noises heard, as well the other neurological inputs that goes into learning to speak (ex proprioception) you'd come out with a lesser number than what LLMs are trained on, but I'm open to any real data on this.

link