| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by helsinkiandrew 1988 days ago
	Can someone with more knowledge of Nvidia GPU's please say how much the V100 costs ($5-10K?) compared with the $900 mac mini.

1 comments

fxtentacle 1988 days ago

You would instead buy a used 1080 (no ti) for similar performance.

The special thing about the V100 is that it's driver EULA allows data center usage. If you don't need that, there are other much cheaper options.

link

spi 1988 days ago

"Similar performance" still means 30%-50% slower [1] and half the RAM, not really that comparable.

For much closer performance you should get a 2080ti, which should be roughly comparable in speed and have 11GB [edit: wrongly wrote 14GB before] of memory (against the 16GB for the V100). Price-wise you still save a lot of money, after quickly googling around, roughly $1200 vs. $15k-$20k.

But you still lose something, e.g. if you use half precision on V100 you get virtually double speed, if you do on a 1080 / 2080 you get... nothing because it's not supported.

(and more importantly for companies, you can actually use only V100-style stuff on servers [edit: as you mentioned already, although I'm not 100% sure it's just drivers that are the issue?])

[1] I've not used 1080 myself, but I've used 1080ti and V100 extensively, and the latter is about 30% faster. Hence my estimate for comparison with 1080

link

fxtentacle 1988 days ago

For my workload (optical flow) I was honestly surprised to see that the Google Cloud V100 was not faster than my local GTX 1080. So I guess that varies a lot by how you're training, too.

For many of my AI training workloads, already the 1080 is "fast enough" and the CPU or SSDs are the bottleneck. In that case, GPU doesn't really matter that much.

link

spi 1988 days ago

Yes that might be the case. In my case I mostly trained big (tens to hundreds of millions of parameters) networks mostly made of 3x3 convolutions, and I think the V100 has dedicated hardware for that. Then as I mentioned you can get a further 2x speedup by using half precision.

If you train smaller models, or RNN, you probably lose most of the gains of dedicated hardware. But I guess that for this same reason the experiments in the article are little more than a provocation, I don't know if you could train a big network in finite time on M1 chips...

That said, of course, if the budget was mine, I wouldn't buy a V100 :-)

link

trott 1988 days ago

> But you still lose something, e.g. if you use half precision on V100 you get virtually double speed, if you do on a 1080 / 2080 you get... nothing because it's not supported.

That's not true. FP16 is supported and can be fast on 2080, although some frameworks fail to see the speed-up. I filed a bug report about this a year ago: https://github.com/apache/incubator-mxnet/issues/17665

What consumer GPUs lack is ECC and fast FP64.

link

FeepingCreature 1988 days ago

How does AMD stuff like Radeon VII or MI100 hold up?

link

fxtentacle 1988 days ago

Can't use it because most AI frameworks won't run on AMD because they did not implement suitable back-ends (yet).

link

breuleux 1988 days ago

There's one for PyTorch, I tested it about a year ago. You have to compile it from scratch and IIRC it translates/compile CUDA to ROCm at runtime which causes noticeable pauses on the first run. There may be other tweaks you have to do too. Once set up it performs decently, though.

link

littlestymaar 1988 days ago

> The special thing about the V100 is that it's driver EULA allows data center usage.

Wait what? Is it the only thing?

That sounds hard to believe: if true, using the open driver (Nouveau) instead of Nvidia's proprietary one would be a massive money saver for datacenters operators (and even if Nouveau doesn't support the features you'd want already, supporting their development would be much cheaper for a company like Amazon than paying a premium on every GPU they buy)

link

rrss 1988 days ago

No, that's not the only thing.

Other characteristics of V100 that may be interesting to people buying GPUs for data centers:

- higher capacity GPU memory. 1080 has 8 GB, V100 has 16 or 32 GB.

- higher bandwidth GPU memory. V100 has HBM2 with a peak of 900 GB/s, 1080 has G5X with a peak of ~300 GB/s.

- ECC support.

- data center certification + warranty

(The geforce warranty covers normal consumer usage, like gaming, and does not cover datacenter use)

- availability of enterprise support contracts.

(If you are buying a ton of GPUs to put in a datacenter, you probably don't want to end up on the normal consumer support line when something goes wrong)

- fast fp64

There are probably others

link

Firadeoclus 1987 days ago

A GTX1080 manages about ~9 TFLOPS(fp32) (and has terrible fp16 support), where V100 gets ~15 TFLOPS(fp16), ~30 TFLOPS(fp16), and ~120 TFLOPS(tensor cores).

Apart from one being a gaming product and the other being designed for computational tasks, they're a generation apart and have various small differences that may be quite relevant for individual tasks (such as V100 allowing twice the shared memory - 96 KiB - per thread block)

link

littlestymaar 1988 days ago

Thanks, that makes much more sense!

link

fsh 1988 days ago

Nouveau does not support CUDA and is therefore not usable for GPU computing on Nvidia.

link

YetAnotherNick 1988 days ago

NVIDIA has EULA to prevent data centre use of their hardware. Also, NVIDIA does not allow bulk buying of RTX series.

link

alickz 1988 days ago

They barely allow single buying for the 30 series :(

Took me quite a while to get my hands on a 3080.

link

jklehm 1988 days ago

What ended up working for you?

link

alickz 1987 days ago

I bought from a (relatively) small German commerce site[1] rather than a bigger site like Amazon, OCUK, or Scan. I'm in EU though, probably doesn't help if you're US. I think I paid a €50 or so premium over the retail price but I didn't mind that too much.

I used this[2] site to keep an eye open for stock, as you can see it's pretty much empty now but I just checked every day and finally found one.

[1] https://www.reichelt.de/ [2] https://www.gputracker.eu/en/search/category/1/graphics-card...

link

sillysaurusx 1988 days ago

Don't buy hardware in general for AI work, IMO. It'll be out of date in a year and you'll end up training in the cloud anyway.

link

dx034 1988 days ago

If you properly utilize your hardware, on premise (or colocation in an area with cheap electricity prices) is vastly cheaper and will likely continue to be for a while. I don't see how training models in the cloud makes financial sense for organizations that can utilize their hardware 24/7.

For all others with burst workloads training in the cloud can make sense, but that has been the case for a while already.

link

sillysaurusx 1988 days ago

We're not talking about organizations, though. I don't agree with your premise, either. People aren't training models 24/7, so the idea that it's "vastly cheaper and will continue to be for a while" isn't true.

link

king_magic 1988 days ago

> People aren’t training models 24/7

... uh, you sure about that? Let me go check on the 3 models I have concurrently training for my organization on 3 separate GPU servers (all 2 year old hardware to boot) that have been running continuously for the past 36 hours. It pretty much works out to 24/7 training for the past several months.

And BTW, this is massively cheaper for us than training in the cloud.

link

qayxc 1988 days ago

Instead of arguing back and forth, how about a test case instead?

Pretraining BERT takes 44 minutes on 1024 V100 GPUs [1]

This requires dedicated instances, since shared instances won't be able to get to peak performance if only because of the "noisy neighbour"-effect.

At GCP, a V100 costs $2.48/h [2], so Microsoft's experiment would've cost $2,539.52.

Smaller providers offer the same GPU at just $1.375/h [3], so a reasonable lower limit would be around $1,408.

For a single BERT pretraining, provided highly optimised workflows and distributed training scripts are already at hand, renting a GPU for single training tasks seems to be the way to go.

The cost of V100-equivalent end-user hardware (we don't need to run in a datacentre, dedicated workstations will do), is about $6,000 (e.g. a Quadro RTX 6000), provided you don't need double precision. The card will have equal FP32 performance, lower TGP and VRAM that sits between the 16 GB and 32 GB version of the V100.

Workstation hardware to go with such card will cost about $2,000, so $8,000 are a reasonable cost estimation. The cost of electricity varies between regions, but in the EU the average non-household price is about 0.13€/kWh [4].

Pretraining BERT therefore costs an estimated 1024 h * 0.13€/kWh * 0.5 kW ≈ 57€ in electricity (power consumption estimated from TGP + typical power consumptions of an Intel Xeon workstation from my own measurements when training models).

In order to get the break-even point we can use the following equation: t * $1,408 = $8,000 + t * $69, which results in t = 8,000/(1408-69) or t > 5.

In short, if you pretrain BERT 6 times, you safe money by BUYING a workstation and running it locally over renting cloud GPUs from a reasonably cheap provider.

This example only concerns BERT, but you can use the same reasoning for any model that you know the required compute time and VRAM requirements of.

This only concerns training, too - inference is a whole different can of worms entirely.

[1] https://www.deepspeed.ai/news/2020/05/27/fastest-bert-traini...

[2] https://cloud.google.com/compute/gpus-pricing

[3] https://www.exoscale.com/syslog/new-tesla-v100-gpu-offering/

[4] https://ec.europa.eu/eurostat/statistics-explained/index.php...

link