| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by totoglazer 2537 days ago
	I think it’s probably because the benchmark isn’t optimized for TPU Pods. Check out the BERT in 76 minutes paper for how you need to rethink the training regime to take advantage of pods.

1 comments

zak 2537 days ago

Yes, Cloud TPU Pods are designed to train much larger models on much larger datasets. And, as you mention, if you are willing to adjust your model architectures and training algorithms to take full advantage of the hardware, you can sometimes achieve substantial gains.

link