Hacker News new | ask | show | jobs
by midland_trucker 1157 days ago
> Dataset size is not relevant to predicting the loss threshold of LLMs. You can keep pushing loss down by using the same sized dataset, but increasingly larger models.

Deepmind and others would disagree with you! No-one really knows in actual fact.

[1] https://www.deepmind.com/publications/an-empirical-analysis-...

1 comments

I don't recall the Chinchilla paper disputing my point. They establish "training-compute optimal" scaling laws, but none of their findings suggest that loss hits any kind of asymptote.
Perhaps we're talking past each other, is "loss threshold" a specific term in LLM literature?

Merely pointing out that the debate as to whether we are compute or data limited (OP) has not concluded at all; There are lots of compelling theories on relationship between the two.