| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by plorkyeran 1482 days ago
	We already know that more compute hours give better results, and a paper which simply consists of rerunning previous work but with 100x the compute hours for .03% better results has not discovered anything new, and there's no point in reading that paper.

3 comments

iakh 1482 days ago

I'm not in this space, but I'd expect that they wouldn't know the results beforehand and just publishing the results even when not a major improvement tells the community that they don't need to spend the extra 99x compute. That seems valuable to some degree. Or is the argument that there wasn't any improvement to be had so why even test the 100x extra?

link

Veedrac 1482 days ago

That might be a reasonable criticism if it remotely reflected reality, but scaling has repeatedly shown to produce qualitatively stronger models, by large margins, doing things that would seem unimaginable for smaller models.

link

whiplash451 1482 days ago

We are not at this point yet. Recent work shows that more compute and more data let Transformers beat convnets on computer vision tasks. This is a lot more insightful than « more compute gets you a little further ».

link