Hacker News new | ask | show | jobs
by spi 1988 days ago
"Similar performance" still means 30%-50% slower [1] and half the RAM, not really that comparable.

For much closer performance you should get a 2080ti, which should be roughly comparable in speed and have 11GB [edit: wrongly wrote 14GB before] of memory (against the 16GB for the V100). Price-wise you still save a lot of money, after quickly googling around, roughly $1200 vs. $15k-$20k.

But you still lose something, e.g. if you use half precision on V100 you get virtually double speed, if you do on a 1080 / 2080 you get... nothing because it's not supported.

(and more importantly for companies, you can actually use only V100-style stuff on servers [edit: as you mentioned already, although I'm not 100% sure it's just drivers that are the issue?])

[1] I've not used 1080 myself, but I've used 1080ti and V100 extensively, and the latter is about 30% faster. Hence my estimate for comparison with 1080

3 comments

For my workload (optical flow) I was honestly surprised to see that the Google Cloud V100 was not faster than my local GTX 1080. So I guess that varies a lot by how you're training, too.

For many of my AI training workloads, already the 1080 is "fast enough" and the CPU or SSDs are the bottleneck. In that case, GPU doesn't really matter that much.

Yes that might be the case. In my case I mostly trained big (tens to hundreds of millions of parameters) networks mostly made of 3x3 convolutions, and I think the V100 has dedicated hardware for that. Then as I mentioned you can get a further 2x speedup by using half precision.

If you train smaller models, or RNN, you probably lose most of the gains of dedicated hardware. But I guess that for this same reason the experiments in the article are little more than a provocation, I don't know if you could train a big network in finite time on M1 chips...

That said, of course, if the budget was mine, I wouldn't buy a V100 :-)

> But you still lose something, e.g. if you use half precision on V100 you get virtually double speed, if you do on a 1080 / 2080 you get... nothing because it's not supported.

That's not true. FP16 is supported and can be fast on 2080, although some frameworks fail to see the speed-up. I filed a bug report about this a year ago: https://github.com/apache/incubator-mxnet/issues/17665

What consumer GPUs lack is ECC and fast FP64.

How does AMD stuff like Radeon VII or MI100 hold up?
Can't use it because most AI frameworks won't run on AMD because they did not implement suitable back-ends (yet).
There's one for PyTorch, I tested it about a year ago. You have to compile it from scratch and IIRC it translates/compile CUDA to ROCm at runtime which causes noticeable pauses on the first run. There may be other tweaks you have to do too. Once set up it performs decently, though.