|
|
|
|
|
by Balinares
260 days ago
|
|
That's a good read also shared by another poster above, thanks! If I'm reading this right, it contextualizes, but doesn't negate the findings from that paper. I've got a major aesthetic problem with the fact LLMs require this much training data to get where they are, namely, "not there yet"; it's brute force by any other name, and just plain kind of vulgar. Although more importantly it won't scale much further. Novel architectures will have to feature in at some point, and I'll gladly take any positive result in that direction. |
|
Poor sample efficiency of the current AIs is a well known issue - but you should keep in mind what kind of grisly process was required to give you the architecture that makes you as sample efficient as you are.
We don't know yet what kind of architectural quirks enable this sample efficiency in the human brain. It could be something like a non-random initialization process that confers the right inductive biases, a more efficient optimizer, recurrent background loops... or just more raw juice.
It might be that one biological neuron is worth 10000 LLM weights, and a big part of how the brain is so sample efficient is that it's hilariously overparametrized.