Hacker News new | ask | show | jobs
by logicchains 4 days ago
>But no human has read anywhere near as much as even relatively small Chinchilla-optimal models

They're missing that humans don't consume raw text. They consume non-stop high resolution, high FPS audio and video imagery. If you tokenized the input to human eyes and ears in the first few years of life, that's more data than even the largest LLMs are trained on.

2 comments

I didn't include it in my summary (it took me an hour to read the whole thing, obviously a lot had to be cut) but the article does actually address the "high resolution" argument in a three-paragraph bullet point under the "Sample Inefficiency" subheading: https://gwern.net/llm-catapult#sample-inefficiency If you read it on a 4K screen at 120 FPS, you should be able to take in its information content in less than a microsecond.
They "address" it by making false statement that the video stream is highly predictable. Sure, you might be able to predict 99% of video stream (for which you'd need to have a physics model, negating the whole point of baby fast learning), but the remaining 1% is still in terabytes if not petabytes per year.
I think this is addressed in the blog post:

  And on the human side, disabled people are not much less intelligent than normal humans: deaf/blind people are much worse at language tasks, but their fluid intelligence often remains normal. If the sensory bandwidth were so critical, this would be impossible.