Hacker News new | ask | show | jobs
by john_moscow 2040 days ago
I am more wondering why hasn't AMD massively invested into porting common ML frameworks to OpenCL. Nvidia has outrageous margins on their datacenter GPUs. They've even banned the use of lower-margin gamer-oriented GPU in datacenters [0]. Given that tensor arithmetic is essentially an easily abstractable commodity, I just don't understand why they don't offer a drop-in replacement.

Most users won't care what hardware their PyTorch model runs on in the cloud. All that matters for them is dollars per training epoch (or cents per inference). This could be a steal for an alternate hardware vendor.

[0] https://web.archive.org/web/20201109023551/https://www.digit...

6 comments

Because AMD's OpenCL tooling sucks. Set -O (optimization) flag on your OpenCL on AMDPro drivers, and malformed code comes out. AMDPro OpenCL 2.0 doesn't support debugging outside of printf statements (and lol at reading through 1024-SIMD threads worth of printf statements every time you wanna figure something out).

Compiler bugs a plenty: the compiler can enter infinite loops just trying to compile OpenCL 2.0 code, taking down your program. If you ever come across such a bug, you're now in guess-or-check mode to figure out exactly what grammar you did to bork the OpenCL compiler.

Oh, and the OpenCL compiler is in the device driver. As soon as your customers update to Radeon 19.x.x.whatever, then you have a new OpenCL compiler with new bugs and/or regressions. The entire concept of tying the COMPILER to the device driver is insane. Or you get support tickets along the lines of "I get an infinite loop on Radeon 18.x.x.y drivers", and now you have to have if(deviceDriver == blah) scattered across your code to avoid those situations.

In practice, you end up staying on OpenCL 1.2 which is stable and has fewer bugs... and has functional debugger and profiler. But now you're missing roughly 8-years worth of features that's been added to GPUs over the last decade.

----------

ROCm OpenCL is decent, but that's ROCm. At that point, you might as well be using HIP, since HIP is just a way easier programming language to use.

Ultimately, I think if you're serious about AMD GPU coding, you should move onto ROCm. Either ROCm/OpenCL, or ROCm/HIP.

ROCm is statically compiled: the compiler is Clang/LLVM and completely compiled on your own workstation. If you distribute the executable, it works. Optimization flags work, there's a GDB interface to debug code. Like, you have a reasonable development environment.

So long as your card supports ROCm (admittingly: not many cards are supported, but... AMDPro OpenCL tooling is pretty poor)

Right. I think the question though is why isn't somebody fixing this situation? There's money sitting on the table for them when they figure it out and get their act together.
> I think the question though is why isn't somebody fixing this situation?

They are. Its called "Use ROCm". Tensorflow support, PyTorch support, etc. etc.

Yeah, its limited to Linux, its limited to a few cards. But within those restrictions, ROCm does work.

As far as I know, AMD doesn't have an incentive to improve this limited offering because they don't have chips with a good enough cost-to-compute ratio to get people to buy them if they did get Rocm/hip/etc working.
Frontier and El Capitan will be the first Exascale processors on the planet (now that Project Aurora has slipped schedule).

Both with AMD MI100 providing the bulk of their compute. Frontier seems like it was given development boards of MI100, because AMD is talking about how they already ported some code over to the MI100 and tested it.

You're on HN so you're probably aware of the costs and difficulty involved in staffing an organization large enough to tackle these issues in an effective time frame.

Nvidia has quite a head start. You're not just talking about some simple driver support either. You're talking about runtime compilation/JIT(to target various flavors of HW), tooling support, library optimizations, API stability and maintenance... AMD can catch up, but unless they come up with a new approach it's going to take a long time and a lot of smart people to do so.

> AMD can catch up, but unless they come up with a new approach it's going to take a long time and a lot of smart people to do so.

I think they will. AMD has the challenger mindset. They rose from the ashes and now actually compete with Intel and they can tackle NVIDIA as well.

ROCm built executables aren't targeted towards an IR, it means pains for each switch to a newer GPU architecture for the user, with you having to distribute binaries again.

Also, no Windows support whatsoever...

AMD is trying to do it but they were late to the party to start with. They maintain compatible versions of PyTorch and Tensorflow, among others. They support only Linux.

https://www.amd.com/en/graphics/servers-solutions-rocm-ml

https://rocmdocs.amd.com/en/latest/

Linux only support is a bit of a letdown for me. My clients insist on using Windows, so I'm left without choice there, it's nvidia or nvidia.
It might be that you could get those up and running in WSL 2, once the GPU passthrough functionality comes to general availability (currently it's only in insider build)[1] - but don't quote me on that as it's just a guess!

[1] https://docs.microsoft.com/en-us/windows/win32/direct3d12/gp...

Okay, gonna take a look at that once it's available.
Rather than OpenCL they have invested in HIP which can target AMD or NVIDIA cards.
Can HIP run on AMD consumer Radeon cards? I'm trying to find the best option to write GPGPU code that runs on other people's machines with hardware I have no control over. I thought OpenCL would become the best way to write code that could run on all PCs and mobile phones, but from my research the GPGPU landscape looks more fragmented for each year.
Not officially supported... and no ROCm/HIP at all on Windows.
OpenCL is supported via ROCm on the new consumer GPUs now, so if the stars align HIP support might come too. The lack of announcements doesn't inspire much confidence though, especially compared to NVIDIA's always-on PR machine.
Is HIP continually evolving ala CUDA? For instance, every new version of CUDA seems to jam in new features to get things to launch faster, use less memory, be easier to use, etc.
It doesn't work on their new GPUs and many others. It's Linux only and etc.
They have invested and it works pretty well. But CUDA has such a huge lead and is the default. Most users dont care about the hardware but they also dont care enough about the cost. And the really price sensitive users are running their own clusters under their desks using gaming GPU’s.

A more detailed explanation (ROCm section) https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

According my research and others' comment, OpenCL is mess, so filled with boilerplate that it is just an interface to each companies own approach. Cuda actually encapsulates GPU computation.

AMD has HIP, which is closer the Cuda but HIP seems less developed.

But I believe the basic problem is AMD doesn't make sufficiently high end GPUs to compete with Nvidia in ML.

Personally, I don't care what happens in the cloud, just what I can buy. I would note Nvidia does have competition in the cloud from Google's TPUs and I assume any large cloud vendor is going to negotiate with Nvidia. While I'd love AMD to be cost-effective for ML somewhere, it seems they aren't 'cause that's not what they're targeting.

As someone who worked deeply with OpenCL at one point, it's because it went off the rails and became a useless hodge-podge. They recently announced that V3 is basically a reversion to 1.0, and that is the first good decision they've made in a while.