| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 2707 days ago

Well, the Radeon VII looks like it is around the 1080 Ti / 2080 for $699.

I think the main issue with AMD is that their compute drivers are clearly behind NVidia's. However, their ROCm development is now on Github, so we can publicly see releases and various development actions. AMD has been active on Github, so the drivers are clearly improving.

But I think it is surprising to see just how far behind they are. ROCm is rewriting OpenCL from scratch, HIP / HCC / etc. etc. is built on top of C++ AMP but otherwise seems to be built from scratch as well. As such, there are still major issues like "ROCm / OpenCL doesn't work with Blender 2.79 yet".

And since ROCm / OpenCL is a different compiler, it has different performance characteristics compared to AMDGPU-PRO (the old OpenCL compiler). So code that worked quickly on AMDGPU-PRO (ex: LuxRender) may work slowly on ROCm / OpenCL (or worst case: not at all, due to compiler errors or whatnot).

EDIT: And the documentation... NVidia offers extremely good documentation. Not only a complete CUDA guide, but a "performance" guide, documented latencies on various instructions (not like Agner Fog level, but useful to understand which instructions are faster than others), etc. etc. AMD used to have an "OpenCL Optimization Guide" with similar information, but it hasn't been updated since the 7970.

EDIT: AMD's Vega ISA documentation is lovely though. But its a bit too low level, and while it gives a great idea of how the GPU executes at an assembly level, it doesn't really have much about how OpenCL relates to it, or optimization tips for that matter. There are certainly nifty features, like DPP, or ds_permute instructions which probably can be used in a Bitonic Sort or something, but there's almost no "OpenCL-level" guide to how to use those instructions. (aside from: https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/. That's basically the best you've got)

That's just the reality of the situation right now for anyone looking into AMD Compute. I'm hopeful that the situation will change as AMD works on fixing bugs and developing (there have been a LOT of development items pushed to their Github repo in the past year). But there's just so much software to be written to have AMD catch up to NVidia. Not just code, but also documentation of their GPUs.

2 comments

keldaris 2707 days ago

From my perspective (computational physics, not machine learning) the situation with GPU compute is very simple. If you are fine writing everything from scratch and won't need the CUDA ecosystem (which is really all there is for good sparse matrix, linear algebra, etc. support), write OpenCL 1.2 (or even GLSL if it's a visualization-heavy code with relatively simple compute) and buy whatever gets you the best compute/$ at that time. Otherwise - and this probably includes most people in this space - you have no choice but to keep using CUDA. There is just no meaningful compute ecosystem for AMD GPUs, sadly.

I'm still very much looking forward to the Radeon VII due to the memory bandwidth, since I'm currently working on bandwidth-constrained CFD simulations. But that's a specific usecase and I write most things from scratch anyway.

link

dragontamer 2707 days ago

AMD's hardware is stupid-good from a compute perspective. Vega64 is $399, but renders Blender (on AMDGPU-PRO drivers) incredibly fast, like 2080 or 1080 Ti level. That's basically the main use case I bought a Vega for (which is why I'm very disappointed in ROCm's current bug which breaks Blender)

If you really can use those 500GB/s HBM2 stacks + 10+ TFlops of power, the Vega is absolutely a monster, at far cheaper prices than the 2080.

I really wonder why video games FPS numbers are so much better on NVidia. The compute power is clearly there, but it just doesn't show in FPS tests.

---

Anyway, my custom code tests are to try and build a custom constraint-solver for a particular game AI I'm writing. Constraint solvers share similarities to Relational Databases (in particular: the relational join operator) which has been accelerated on GPUs before.

So I too am a bit fortunate that my specific use cases actually enable me to try ROCm. But any "popular" thing (Deep Learning, Matrix Multiplications, etc. etc.) benefits so heavily from CUDA's ecosystem that its hard to say no to NVidia these days. CUDA is just more mature, with more libraries that help the programmer.

AMD's system is still "some assembly required", especially if you run into a compiler bug or care about performance... (gotta study up on that Vega ISA...) And unfortunately, GPU Assembly language is a fair bit more mysterious than CPU Assembly Language. But I expect any decent low-level programmer to figure it out eventually...

link

microcolonel 2707 days ago

I agree, and I'd add that VII is probably going to be a lot better. There are some pretty big benefits to the open drivers as well (which can be used for OpenGL, even if you use the AMDGPU-PRO OpenCL, which is probably wise if OpenCL is what you want to do).

As one example, I have a recurring task that runs on my GPU in the background, and I sleep next to the computer that does that. Since I don't want it to be too noisy, and it is acceptable for it to take longer to run while I'm asleep, I have a cron job which changes the power cap through sysfs to a more reasonable 45W (and at those levels, it's much more efficient anyhow, especially with my tuned voltages) at night.

> I really wonder why video games FPS numbers are so much better on NVidia. The compute power is clearly there, but it just doesn't show in FPS tests.

Drivers are hard, and AMD has sorta just been getting around to doing them well. The Mesa OpenGL drivers are usually faster than AMDGPU-PRO at OpenGL, and RADV is often faster than AMDGPU-PRO Vulkan (and AMDVLK).

I've been hoping these last few years that AMD would try to ship Mesa on Windows (i.e., add a state tracker for the low level APIs underlying D3D), and save themselves the effort. As far as I can tell, there is no IP issue preventing them from doing that (including if they have to ship a proprietary version with some code they don't own). There still seems to be low-hanging fruit in Mesa, but the performance is already usually better.

link

mangix 2707 days ago

https://github.com/hashcat/hashcat has some assembly optimizations. They look fairly readable.

link

Rychard 2707 days ago

I bought my 1080 TI just over a year ago (December 2017) from Newegg for $750. (Newegg item N82E16814126186)

I'm glad AMD is finally catching up, but a savings of only $51 an entire year later doesn't exactly sound like a particularly great deal to me.

link

dragontamer 2707 days ago

Welcome to the end of Moore's Law. 7nm is as expensive as 14nm was. Sure, you gained double the density, but it costs twice as much to make. So you only get improved performance / watt. Cost per transistor stayed equal in this 7nm generation.

NVidia's 2080 (roughly equivalent to the 1080 Ti) is also $699 to $799, depending on which model you get. Its the nature of how the process nodes work now.

-----------

Rumor is that the lower-end of the market will get price/performance upgrades, as maybe small-7nm chips will have enough yield to actually give cost-savings. But that's a bit of "hopes and dreams" leaking in, as opposed to any hard data.

For now, it is clear that 300mm^2 7nm chips (like the Radeon VII) are going to be costly. Probably due to low yields, but its hard to know for sure. Note that Zen2 and Apple chips are all at around 100mm^2 or so (which seems to indicate that yields are fine for small chips... but even then, Apple's Phones definitely increased in price as they hit 7nm)

link