|
|
|
|
|
by john_moscow
2040 days ago
|
|
I am more wondering why hasn't AMD massively invested into porting common ML frameworks to OpenCL. Nvidia has outrageous margins on their datacenter GPUs. They've even banned the use of lower-margin gamer-oriented GPU in datacenters [0]. Given that tensor arithmetic is essentially an easily abstractable commodity, I just don't understand why they don't offer a drop-in replacement. Most users won't care what hardware their PyTorch model runs on in the cloud. All that matters for them is dollars per training epoch (or cents per inference). This could be a steal for an alternate hardware vendor. [0] https://web.archive.org/web/20201109023551/https://www.digit... |
|
Compiler bugs a plenty: the compiler can enter infinite loops just trying to compile OpenCL 2.0 code, taking down your program. If you ever come across such a bug, you're now in guess-or-check mode to figure out exactly what grammar you did to bork the OpenCL compiler.
Oh, and the OpenCL compiler is in the device driver. As soon as your customers update to Radeon 19.x.x.whatever, then you have a new OpenCL compiler with new bugs and/or regressions. The entire concept of tying the COMPILER to the device driver is insane. Or you get support tickets along the lines of "I get an infinite loop on Radeon 18.x.x.y drivers", and now you have to have if(deviceDriver == blah) scattered across your code to avoid those situations.
In practice, you end up staying on OpenCL 1.2 which is stable and has fewer bugs... and has functional debugger and profiler. But now you're missing roughly 8-years worth of features that's been added to GPUs over the last decade.
----------
ROCm OpenCL is decent, but that's ROCm. At that point, you might as well be using HIP, since HIP is just a way easier programming language to use.
Ultimately, I think if you're serious about AMD GPU coding, you should move onto ROCm. Either ROCm/OpenCL, or ROCm/HIP.
ROCm is statically compiled: the compiler is Clang/LLVM and completely compiled on your own workstation. If you distribute the executable, it works. Optimization flags work, there's a GDB interface to debug code. Like, you have a reasonable development environment.
So long as your card supports ROCm (admittingly: not many cards are supported, but... AMDPro OpenCL tooling is pretty poor)