Hacker News new | ask | show | jobs
by Lauris100 262 days ago
I do feel that AI has been overhyped a bit for now, but what happens when we scale our electricity and GPU production 10x, 100x, go nuclear etc and can 100x AI models - lets see. Its too early to tell really.
2 comments

Evidence suggests we’re data rather than compute scarce. Nowadays models are trained using other models and we’ve started to see domain collapse.
The amusing thing is that it takes several orders of magnitude less data to bring up a human to reasonably competent adulthood, which means that there is something fundamentally flawed in the brute-force approach to training LLMs, if the goal is to get to human-equivalent competency.

Also the fact that 30B models, while less capable than 300B+ models, are not quite one whole order of magnitude less capable, suggests that all things being equal, capability scales sub-linearly to parameter count. It's even more flagrant with 4B models, honestly. The fact that those are serviceable at all is kind of amazing.

Both factors add up to the hunch that a point of diminishing returns must soon be met, if it hasn't already. But as long as no one asks where all the money went I suppose we can keep going for a while still. Just a few more trillions bro, we're so close.

I suspect there’s a good deal baked into a human brain we’re not fully aware of. So babies aren’t starting from zero, they have billions of years of evolution to bootstrap from.

For example, language might not be baked in, but the software for quickly learning human languages certainly is.

To take a simple example, spiders aren’t taught by their mothers how to spin webs. They do it instinctively. So that means the software for spinning a web is in the spider’s DNA.

I don't think we can even do a "Handbook of LLM techniques" at this point and have something thick enough to raise a monitor. It's all in the Data. First they started with copyrighted material, then public sources (maybe private sources as well), and now they are hammering every server in existence to get moooorre..
> what happens when we scale our electricity and GPU production 10x, 100x

Nothing interesting without some fundamental breakthrough IMO. Model/agent providers add another level of "thinking" that uses 10x the energy for 10% gain on benchmarks.

They also typically dilute human input with self-talk. You can steer them in a direction, but the internal conversation can convince itself otherwise. It’s not frequent but it’s very frustrating when it happens.