|
|
|
|
|
by ben_w
735 days ago
|
|
> do people really believe GPT-4 was trained with more data than a 4 year old? I think it was; the guesstimate I've seen is GPT-4 was trained on 13e12 tokens, that over 4 years is 8.9e9/day or about 1e5/s. Then it's a question of how many bits per token — my expectation is 100k/s is more than the number of token-equivalents we experience, even though it's much less than the bitrate even of just our ears let alone our eyes. |
|
The analogies get a little blurry here, but perhaps we can draw a distinction between information that an infant gets from their higher-level senses (e.g. sight, smell, touch, etc) versus any lower-level biological processes (genetics, epi-genetics, developmental processes, and so on).
The main point is that there is a fundamental difference: LLMs have very little prior knowledge [1] while humans contain an immense amount of information even before they begin learning through the senses.
We need to look at the billions of years of biological evolution, millions of years of cultural evolution, and the immense amounts of environmental factors, all which shape us before birth and before any “learning” occurs.
[1] The model architecture probably counts as hard-coded prior knowledge contained before the model begins training, but it is a ridiculously small amount of information compared to the complexity of living organisms.