Hacker News new | ask | show | jobs
by Q6T46nT668w6i3m 262 days ago
Evidence suggests we’re data rather than compute scarce. Nowadays models are trained using other models and we’ve started to see domain collapse.
2 comments

The amusing thing is that it takes several orders of magnitude less data to bring up a human to reasonably competent adulthood, which means that there is something fundamentally flawed in the brute-force approach to training LLMs, if the goal is to get to human-equivalent competency.

Also the fact that 30B models, while less capable than 300B+ models, are not quite one whole order of magnitude less capable, suggests that all things being equal, capability scales sub-linearly to parameter count. It's even more flagrant with 4B models, honestly. The fact that those are serviceable at all is kind of amazing.

Both factors add up to the hunch that a point of diminishing returns must soon be met, if it hasn't already. But as long as no one asks where all the money went I suppose we can keep going for a while still. Just a few more trillions bro, we're so close.

I suspect there’s a good deal baked into a human brain we’re not fully aware of. So babies aren’t starting from zero, they have billions of years of evolution to bootstrap from.

For example, language might not be baked in, but the software for quickly learning human languages certainly is.

To take a simple example, spiders aren’t taught by their mothers how to spin webs. They do it instinctively. So that means the software for spinning a web is in the spider’s DNA.

I don't think we can even do a "Handbook of LLM techniques" at this point and have something thick enough to raise a monitor. It's all in the Data. First they started with copyrighted material, then public sources (maybe private sources as well), and now they are hammering every server in existence to get moooorre..