Hacker News new | ask | show | jobs
by sillysaurusx 2094 days ago
You're almost certainly using the TPUs wrong. It's very easy to use them wrong, unfortunately.

When you use them right, a TPUv3-8 gets equivalent perf to a cluster of 8 V100s.

I was astounded. I trained StyleGAN 2 from scratch at 1024x1024 in 2.5 days. nvidia took 7 days for their official model. Granted, I used a v3-32, not a v3-8, but performance seems pretty similar.