Hacker News new | ask | show | jobs
by ArthurBrussee 2528 days ago
Is it me or are those results somewhat underwhelming if anything? Dedicated hardware for a 2x speedup at best, tossup for most results, and only competes in some categories. Not to be just a NVIDIA fan here, surely there is value in dedicated training hardware, but just surprising that benefit isn't bigger!
3 comments

Disclosure: I work on Google Cloud (even with Zak sometimes).

The DGX-2h is a beast! Don’t parse this as “huh, TPU Pods are about the same as just a few V100s”. The data sheet [1] is probably the easiest to follow, but their writeup is more informative [2].

These are souped up V100s, with awesome networking, which is pretty similar in style to a TPU Pod. So I’d say that they’re both purpose built systems for distributed ML training. The name for the NVIDIA system is even “DGX SuperPOD” :).

[1] https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Dat...

[2] https://devblogs.nvidia.com/dgx-superpod-world-record-superc...

I think it’s probably because the benchmark isn’t optimized for TPU Pods. Check out the BERT in 76 minutes paper for how you need to rethink the training regime to take advantage of pods.
Yes, Cloud TPU Pods are designed to train much larger models on much larger datasets. And, as you mention, if you are willing to adjust your model architectures and training algorithms to take full advantage of the hardware, you can sometimes achieve substantial gains.
Author of the blog post here.

As mentioned in other comments, I'd recommend doing a performance-per-dollar comparison in addition to looking at this pure performance comparison at maximum scale.