| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by candiddevmike 265 days ago
	We need a fundamental paradigm shift beyond transformers. Throwing more compute or data at it isn't pushing the needle.

2 comments

marcosdumay 265 days ago

Just to point, but there's no more data.

LLMs would always bottleneck on one of those two, as computing demand grows crazy quickly with the data amount, and data is necessarily limited. Turns out people threw crazy amounts of compute into it, so the we got the other limit.

link

Mistletoe 265 days ago

Yeah I’m constantly reminded of a quote about this- you can’t make another internet. LLMs already digested the one we have.

link

bubblelicious 265 days ago

Epoch has a pretty good analysis of bottlenecks here:

https://epoch.ai/blog/can-ai-scaling-continue-through-2030

There is plenty of data left, we don’t just train with crawled text data. Power constraints may turn out to be the real bottleneck but we’re like 4 orders of magnitude away

link

bigyabai 265 days ago

Synthetic data works.

link

marcyb5st 265 days ago

There's a limit to that according to: https://www.nature.com/articles/s41586-024-07566-y . Basically, if you use an LLM to augment a training dataset it will become "dumber" every subsequent generation and I am not sure how you can generate synthetic data for a language model without using a language model

link

yorwba 265 days ago

Synthetic data doesn't have to come from an LLM. And that paper only showed that if you train on a random sample from an LLM, the resulting second LLM is a worse model of the distribution that the first LLM was trained on. When people construct synthetic data with LLMs, they typically do not just sample at random, but carefully shape the generation process to match the target task better than the original training distribution.

link

bubblelicious 265 days ago

And you don’t think that’s already happening? Also where is your evidence for this?

link

bigyabai 265 days ago

> Also where is your evidence for this?

The fact that "scaling laws" didn't scale? Go open your favorite LLM in a hex editor, oftentimes half the larger tensors are just null bytes.

link

bubblelicious 265 days ago

Show me a paper, this makes no sense of course scaling laws are scaling

link