| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by squarefoot 3517 days ago
	Some background. https://www.youtube.com/watch?v=IVpOyKCNZYw Shouldn't be a problem anymore under Linux as most distros today install Nouveau drivers by default. https://nouveau.freedesktop.org/wiki/

2 comments

greydius 3517 days ago

Nouveau has no CUDA support, unfortunately.

link

valarauca1 3516 days ago

CUDA is proprietary to Nvidia.

CUDA only exists because Nvidia is attempting to pretend OpenCL, Vulkan, and DX12 don't exist [1]. These require hardware scheduling on the GPU to switch shaders. Rather then dedicated X amount of chip hardware to Y shader for Z ms.

It should be noted for GPGPU compute Nvidia is not the correct choice. AMD RX 480 has 5.8TFLOPS @$200 ($37/TFLOP) vs Nvidia GTX1080 8.9TFLOPS @$600 ($67/TFLOP). In reality you should be doing GPU programming in OpenCL so you are GPU agnostic. You can switch vendors or platforms seamlessly (in most cases if you avoid proprietary extensions) even target AMD64, ARM, and POWER8/9 hardware.

That being said I own a boat load of Nvidia stock because their marketing is excellent. Really marketing is all 80% of people pay attention too. CUDA has some great marketing around it. In reality CUDA is slower then OpenCL (on Nvidia's platforms even) and no easier to work in.

[1] https://postimg.org/image/vsnidk8p5/

link

nhaehnle 3516 days ago

It's worth pointing out that ROCm is basically AMD's answer to CUDA. Similar programming model and everything.

Let's hope it gets picked up by machine learning frameworks etc., because this market badly needs the competition, as your comparison of per-dollar raw performance numbers shows.

link

maksimum 3516 days ago

I agree with your point about avoiding vendor lockins, something I experienced for myself with MATLAB. I also happened to buy a RX 480 recently, so I'm happy to hear it's good for GPGPU.

But I'm curious in how the FLOPS on these cards were measured. For example one concern I have is that presumably these two cards have slightly different levels of parallelism. So it may be more or less difficult to extract the full performance from a particular card due to parallelism overhead. Then there's driver overhead, ease of programming, etc.

link

valarauca1 3516 days ago

FLOPS is always calculated via the simple formual

      F * (1/Hz) * 2 = FLOPS

Where F is # of FPU front ends (SIMD and scalar). This is wrong because scalar math often is slower then SIMD, and compute kernels rarely run on the scalar pipeline.

Where Hz is the well.. the clock rate, inverse to get cycles per second. This is wrong because stalls happen, memory transfers, cache misses etc. It is also wrong because the clock rate is throttled and you are not always at Maximum boost clock.

Then multiply by 2 for FMA (fused multiply add). This is wrong because well not every operation is a one cycle FMA. Division can be many (>100). Also scalar pipelines don't have FMA.

Ultimately all vendors use the same crappy calculation so we are comparing apples to apples. Just rotten apples to rotten apples. It gives you a good ideal circumstance you can optimize towards but never actually attain.

link

Kubuxu 3516 days ago

There is difference in job scheduling between AMD and Nvidia. So if you want to optimize your OpenCL applications you can do it only for one of them or do it twice.

Sample applies to integer math, long double math and so on.

link

nitrogen 3516 days ago

Do the power consumption numbers cancel out the up-front price advantage?

link

valarauca1 3516 days ago

RX480 consumes less power then the GTX1080 so they'd amplify the initial price advantage.

link

nitrogen 3515 days ago

Is that total power, or power per GFLOP?

link

valarauca1 3515 days ago

Are you seriously asking me to do division for you? Do you own a calculator, cellphone or computer? Or are you actually that helpless?

AMD RX 480 has 5.8TFLOPS @$200 ($37/TFLOP) @231Watt Peak (2.5GFLOPS/Watt)

Nvidia GTX1080 8.9TFLOPS @$600 ($67/TFLOP) @318Watt Peak (2.8GFLOPS/Watt)

See Furmark benchmark for wattage values [1]

Typical KW-H in the US $0.12KW-H [2]. So the delta-cost of GTX1080 vs RX480 will be mitigated by GFLOPS/Watt efficiency savings in 4 years, 4 months. Which on a typical 2, 3, or even 4 year hardware replacement cycle the extra cost will NEVER be re-couped.

[1] http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1... for load wattage numbers

[2] http://www.npr.org/sections/money/2011/10/27/141766341/the-p...

link

zanny 3517 days ago

It can't, Nouveau is not developed by Nvidia.

link

shmerl 3516 days ago

Nouveau has no reclocking, so it's practically useless as is. If you want a working open driver - use AMD.

link