Hacker News new | ask | show | jobs
by candiddevmike 218 days ago
We need a fundamental paradigm shift beyond transformers. Throwing more compute or data at it isn't pushing the needle.
2 comments

Just to point, but there's no more data.

LLMs would always bottleneck on one of those two, as computing demand grows crazy quickly with the data amount, and data is necessarily limited. Turns out people threw crazy amounts of compute into it, so the we got the other limit.

Yeah I’m constantly reminded of a quote about this- you can’t make another internet. LLMs already digested the one we have.
Epoch has a pretty good analysis of bottlenecks here:

https://epoch.ai/blog/can-ai-scaling-continue-through-2030

There is plenty of data left, we don’t just train with crawled text data. Power constraints may turn out to be the real bottleneck but we’re like 4 orders of magnitude away

Synthetic data works.
There's a limit to that according to: https://www.nature.com/articles/s41586-024-07566-y . Basically, if you use an LLM to augment a training dataset it will become "dumber" every subsequent generation and I am not sure how you can generate synthetic data for a language model without using a language model
Synthetic data doesn't have to come from an LLM. And that paper only showed that if you train on a random sample from an LLM, the resulting second LLM is a worse model of the distribution that the first LLM was trained on. When people construct synthetic data with LLMs, they typically do not just sample at random, but carefully shape the generation process to match the target task better than the original training distribution.
And you don’t think that’s already happening? Also where is your evidence for this?
> Also where is your evidence for this?

The fact that "scaling laws" didn't scale? Go open your favorite LLM in a hex editor, oftentimes half the larger tensors are just null bytes.

Show me a paper, this makes no sense of course scaling laws are scaling