That quote is disingenuous. Do people really think that...
* Jeff Dean, lead of Google's AI division, wrote a paper with all that complexity to get SOTA on CIFAR-10?
* Jeff Dean, whose salary is sometimes estimated as $3m/y and is responsible for the direction of research of many more, is unreasonable for using <$60k of compute at public pricing, and less than that at internal pricing?
* going from a 0.6% error rate to a 0.57% error rate is reasonably summarized as ‘a 0.03% improvement’, ignoring both that it's a 5% reduction in error and that such improvements get harder as you approach (or exceed) the label accuracy of the dataset?
* the accuracy from this paper came purely from scale?
Still, it's 0.03% difference, or 3 images difference out of 10k images in CIFAR-10. Just 3 images.
Re-training SotA with a different random seed may make its score 0.03% difference. Or there was a wrong calculation in 17,810 TPU core-hours due to faulty hardware or cosmic ray hit which cause the final produce model 0.03% difference.
The problem with this sort of argument against caring about SOTA scores is that there is only so much luck to go around. While any individual 5% reduction in error rates could theoretically be highly influenced by luck, if you have a chain of small reductions in error rates, such that the difference between the first and the last is more like a factor of 2, then you know that somewhere in the middle of that, even if any individual improvement is suspect, there must have been real, gradual improvement.
It isn't that important on CIFAR-10 any more, which is pretty much a solved benchmark, but CIFAR was only solved because of such incremental progress, and papers focusing on moving the state of the art use newer, much harder benchmarks.
> Re-training SotA with a different random seed may make its score 0.03% difference. Or there was a wrong calculation in 17,810 TPU core-hours due to faulty hardware or cosmic ray hit which cause the final produce model 0.03% difference.
Isn’t it the job of science to determine if this is the case?
Put another way though, the failure rate was decreased from 0.6 to 0.57 or a 5% reduction. That's pretty significant. If you can reduce LASIK failure rate by 5%, that would provide a ton of value although you would be talking about an absolute improvement of 0.001% in success rate.
I agree that the improvements we are seeing are increasingly due to simply spending more time/money/power but that quip is probably the weakest argument. I would have liked to have seen a Fermi calculation where the power used during training is only 1% (or probably much less) of the total power used. The other thing that reeks naivety is basically the world takes a lot of compute. Much more money and compute is wasted on Candy Crush for instance.
* Jeff Dean, lead of Google's AI division, wrote a paper with all that complexity to get SOTA on CIFAR-10?
* Jeff Dean, whose salary is sometimes estimated as $3m/y and is responsible for the direction of research of many more, is unreasonable for using <$60k of compute at public pricing, and less than that at internal pricing?
* going from a 0.6% error rate to a 0.57% error rate is reasonably summarized as ‘a 0.03% improvement’, ignoring both that it's a 5% reduction in error and that such improvements get harder as you approach (or exceed) the label accuracy of the dataset?
* the accuracy from this paper came purely from scale?