|
|
|
|
|
by mdemare
281 days ago
|
|
Just using common sense, if we had a genius, who had tremendous reasoning ability, total recall of memories, and an unlimited lifespan and patience, and he'd read what the current LLMs have read, we'd expect quite a bit more from him than what we're getting now from LLMs. There are teenagers that win gold medals on the math olympiad - they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on. A difference of eight orders of magnitude. In other words, data scarcity is not a fundamental problem, just a problem for the current paradigm. |
|
If we can reduce the precision of the model parameters by 2~32x without much perceptible drop in performance, we are clearly dealing with something wildly inefficient.
I'm open to the possibility that over parameterization is essential as part of the training process, much like how MSAA/SSAA over sample the frame buffer to reduce information aliasing in the final scaled result (also wildly inefficient but very effective generally). However, I think for more exotic architectures (spiking / time domain) these rules don't work the same way. You can't back propagate a recurrent SNN so much of the prevailing machine learning mindset doesn't even apply.