Hacker News new | ask | show | jobs
by ezoe 1482 days ago
CIFAR-10 is consists of 10,000 test images. So 0.03% of CIFAR-10 is 3 images.

At this tiny number, the randomness is starting to affect the scores. Like labeling mistake of test data by human. Maybe, training SotA with different random seeds make its score 0.03% better or worse.

Hell, 17,810 TPU core-hours is a huge number. You can't ignore the work of randomness. What if a cosmic ray hit a specific memory cell which cause the soft memory error, causing a single wrong calculation which ultimately cause the final trained model 0.03% difference?

So, it's more like: "Jeff Dean spent enough money to feed a family of four for half a decade to get a 0.03% of winning lottery on CIFAR-10."

1 comments

TPUs make numerical errors more frequently than you'd expect- and it's not cosmic rays, it's QA errors (individual chips were manufactured that passed QA but very occasionally, for specific inputs and operations, produce garbage). When you run on a full pod, many workloads will eventually see corruptions, often in the form of a propagating NaN in critical data like the gradient or weights, that the training cannot recover from.

In fact in a recent big paper from Google they mentioned that training occasionally went wonky in completely nonreproducible ways, but I am pretty sure I know what happened.