Hacker News new | ask | show | jobs
by boulos 3036 days ago
Disclosure: I work on Google Cloud.

Depends on your metric, Jonathan! If you focus on the per dollar numbers, then it’s actually net favorable to the V100, because a second GPU over NVLINK won’t be as cost-efficient. If what you care about is raw throughput “in a single box”, then 8xV100 probably comes out ahead here.

Like someone else below though, I worry about the “hey wait a minute, changing the batch size just for the TPU seems unfair” and the whole “the LSTM didn’t converge” bit. Not a bad first draft, but hopefully the authors can do some more comparisons.

1 comments

Author here.

Thanks for your feedback. As I noted above, we will report further results with larger batch sizes (and smaller ones for the TPU). The LSTM not converging is one of the experiences we wanted to share. We are working on solving this issue and will update the post accordingly. Our goal is really a fair and valuable comparison, which is not easy, so we value all of the feedback.