Hacker News new | ask | show | jobs
by Strilanc 1482 days ago
They explicitly say they trust the results. They're complaining that top labs use lots of compute, so the results aren't relevant to someone who can't. They give an example where a paper used 18K TPU core hours. It's easy to find papers that use millions of core hours.

IMO, asking AI people to not use expensive compute is like asking astronomers to please stop using expensive telescopes. The opposite side of this argument is "Gee, it looks like increasing compute helps AI a lot. Why the heck have we been spending so little on compute?" [0].

[0]: https://www.gwern.net/Scaling-hypothesis

2 comments

We already know that more compute hours give better results, and a paper which simply consists of rerunning previous work but with 100x the compute hours for .03% better results has not discovered anything new, and there's no point in reading that paper.
I'm not in this space, but I'd expect that they wouldn't know the results beforehand and just publishing the results even when not a major improvement tells the community that they don't need to spend the extra 99x compute. That seems valuable to some degree. Or is the argument that there wasn't any improvement to be had so why even test the 100x extra?
That might be a reasonable criticism if it remotely reflected reality, but scaling has repeatedly shown to produce qualitatively stronger models, by large margins, doing things that would seem unimaginable for smaller models.
We are not at this point yet. Recent work shows that more compute and more data let Transformers beat convnets on computer vision tasks. This is a lot more insightful than « more compute gets you a little further ».
The main problem is that it kills the double blind nature of peer review.

If your paper says you trained on 1000TPUs for weeks, we all know you work at Google brain.

This is subversive for our field. It's really really bad that these authors are virtually guaranteed to be accepted for these reasons alone.