Hacker News new | ask | show | jobs
by totoglazer 2537 days ago
I think it’s probably because the benchmark isn’t optimized for TPU Pods. Check out the BERT in 76 minutes paper for how you need to rethink the training regime to take advantage of pods.
1 comments

Yes, Cloud TPU Pods are designed to train much larger models on much larger datasets. And, as you mention, if you are willing to adjust your model architectures and training algorithms to take full advantage of the hardware, you can sometimes achieve substantial gains.