Hacker News new | ask | show | jobs
by Veedrac 1482 days ago
That quote is disingenuous. Do people really think that...

* Jeff Dean, lead of Google's AI division, wrote a paper with all that complexity to get SOTA on CIFAR-10?

* Jeff Dean, whose salary is sometimes estimated as $3m/y and is responsible for the direction of research of many more, is unreasonable for using <$60k of compute at public pricing, and less than that at internal pricing?

* going from a 0.6% error rate to a 0.57% error rate is reasonably summarized as ‘a 0.03% improvement’, ignoring both that it's a 5% reduction in error and that such improvements get harder as you approach (or exceed) the label accuracy of the dataset?

* the accuracy from this paper came purely from scale?

1 comments

Still, it's 0.03% difference, or 3 images difference out of 10k images in CIFAR-10. Just 3 images.

Re-training SotA with a different random seed may make its score 0.03% difference. Or there was a wrong calculation in 17,810 TPU core-hours due to faulty hardware or cosmic ray hit which cause the final produce model 0.03% difference.

The problem with this sort of argument against caring about SOTA scores is that there is only so much luck to go around. While any individual 5% reduction in error rates could theoretically be highly influenced by luck, if you have a chain of small reductions in error rates, such that the difference between the first and the last is more like a factor of 2, then you know that somewhere in the middle of that, even if any individual improvement is suspect, there must have been real, gradual improvement.

It isn't that important on CIFAR-10 any more, which is pretty much a solved benchmark, but CIFAR was only solved because of such incremental progress, and papers focusing on moving the state of the art use newer, much harder benchmarks.

> Re-training SotA with a different random seed may make its score 0.03% difference. Or there was a wrong calculation in 17,810 TPU core-hours due to faulty hardware or cosmic ray hit which cause the final produce model 0.03% difference.

Isn’t it the job of science to determine if this is the case?

Without running expensive experiments apparently.