Hacker News new | ask | show | jobs
by sdenton4 2976 days ago
"As shown above, the top-1 accuracy after 90 epochs for the TPU implementation is 0.7% better. This may seem minor, but making improvements at this already very high level is extremely difficult and, depending on the application, such small improvements may make a big difference in the end."

Any idea of how much variation in accuracy you get on different training runs of the same model on the same hardware? My understanding is that model quality can and does vary from one run to the next on these kinds of large datasets - from a single observation, it's hard to know if the difference is real or noise.

1 comments

I've been running a lot of these resnet-50 experiments lately and the run-to-run variation is very small, on the order of 0.1%. It's actually pretty amazing how consistent training is given that the initialization is always different and the data is sampled differently on each run. (As an aside, it took us about three weeks to track down a bug that was causing the model to consistently reach an accuracy 1% lower than it was supposed to.)
Indeed, that's also my experience. ImageNet is pretty huge (although 'it's the new MNIST') so that seems to help converging to very similar solutions and accuracies.

Tracking down bugs in convergence is really costly in these settings. We had a problem in pre-processing that took us quite a while to figure out...