Hacker News new | ask | show | jobs
by plorkyeran 1482 days ago
We already know that more compute hours give better results, and a paper which simply consists of rerunning previous work but with 100x the compute hours for .03% better results has not discovered anything new, and there's no point in reading that paper.
3 comments

I'm not in this space, but I'd expect that they wouldn't know the results beforehand and just publishing the results even when not a major improvement tells the community that they don't need to spend the extra 99x compute. That seems valuable to some degree. Or is the argument that there wasn't any improvement to be had so why even test the 100x extra?
That might be a reasonable criticism if it remotely reflected reality, but scaling has repeatedly shown to produce qualitatively stronger models, by large margins, doing things that would seem unimaginable for smaller models.
We are not at this point yet. Recent work shows that more compute and more data let Transformers beat convnets on computer vision tasks. This is a lot more insightful than « more compute gets you a little further ».