Hacker News new | ask | show | jobs
by dragontamer 2982 days ago
> OpenCL is not a disaster at all.

I'll probably need to be more specific. OpenCL 1.0 through 1.2 is fine, but fell hopelessly behind NVidia's CUDA efforts. NVidia CUDA has more features that lead to proven performance enhancements.

OpenCL 2.0 was the "counterpunch" to bring OpenCL up to CUDA-level features. However, OpenCL 2.0 is virtually stillborn. Only Intel and AMD platforms support OpenCL2.0. Intel Xeon Phi are relatively niche (and their primary advantage seems to be x86 code compatibility anyway. So I doubt you'd be running OpenCL on them).

AMD OpenCL 2.0 support exists, but is rather poor. The OpenCL 2.0 debugger simply is non-functional and you're forced to use lol printfs.

That leaves OpenCL 1.2. Its okay, but it is years behind what modern hardware can do. Its atomic + barrier model is strange compared to proper C++11 Atomics, its missing important features like device-side queuing, shared virtual memory, unified address space (no more copy/paste code just to go from "local" to "private" memory), among other very useful features.

> Even today OpenCL is a viable solution for GPU

OpenCL 1.2 is a viable solution. An old, crusty, and quirky solution, but viable nonetheless. OpenCL 2.0+ is basically dead. And I think only Intel Xeon Phi supports the latest OpenCL 2.2.

I bet you there are more Vulkan compute shaders out there than there are OpenCL 2.0. Indeed, there are rumors that the Khronos project is going to be focusing on Vulkan compute shaders in the future.

> The "single source" argument is completely overrated. Furthermore, you can have single source in OpenCL putting the code in strings.

I like my compile-time errors to be during compile-time. Not during run-time on my client's system. Compiler-bugs in AMD drivers are fixed through device driver updates (!!!) which makes practical deployment of plain-text OpenCL source code far more of a hassle in practice.

Consider this horror story: a compiler bug in some AMD Device Driver versions which cause a segfault on some hardware versions. This is not theoretical: https://community.amd.com/thread/160362.

In practice, deploying OpenCL 1.2 code requires you to test all of the device drivers your client base is reasonably expected to run.

-----

But that's not the only issue.

"Single Source" means that you can define a singular structure in a singular .h file and actually have it guaranteed to work between CPU-code and GPU-code. Data-sharing code is grossly simplified and is perfectly matched.

The C++ AMP model (which has been adopted into AMD's ROCm platform) is grossly superior. You specify a few templates and bam, your source code automatically turns into CPU code OR GPU-code. Extremely useful when sharing routines between the CPU and GPU (like data-packing or unpacking from the buffers)

With that said, AMD clearly cares about OpenCL and the ROCm platform looks like it strongly supports OpenCL through then near term, especially OpenCL 1.2 which seems to have a big codebase.

However, if I were to do any project these days, I'd do it in ROCm's HCC / single-source C++ system or CUDA. OpenCL 1.2 is useful for high-compatibility but has major issues as an environment.

1 comments

The point I really wanted to make here is that OpenCL is only a disaster because NVidia was scared of the competition it would bring from AMD.
I'm sure NVidia deserves some blame.

But AMD drivers which cause OpenCL compiler-segfaults and/or infinite loops is a problem that rests squarely on AMD's shoulders.

I have extensively used OpenCL on both AMD and NVidia for a few years and never had such problems. If anything, found a few more bugs with NVidia.
Interesting. I'll take your anecdote for what its worth.

My personal use case with OpenCL didn't seem to be going very well. I was testing on my personal Rx 290x. While I didn't have the crashing / infinite loop bugs (See LuxRender's "Dave" for details: http://www.luxrender.net/forum/viewtopic.php?f=34&t=11009) that other people had, my #1 issue was with the quality of AMD's OpenCL compiler.

In particular, the -O2 flag would literally break my code. I was doing some bit-level operations, and those bit-level operations were just wrong under -O2. While the -O0 flag was so unoptimized that my code was regularly swapping registers into / out of global memory. At which point the CPU was faster at executing and there was no point in using OpenCL / GPU compute.

It seems like AMD's OpenCL implementation assumes that the kernels would be very small and compact. And it seems to be better designed for floating-point ops. Other programmers online have also complained about AMD's bit-level operations returning erronious results under -O2. My opinion of its compiler was... rather poor... based on my limited exposure. And further research seems to indicate that I wasn't the only one having these issues.

Only did and do floating point for image processing. In fact, looking into my logs, I registered 5 bugs with NVidia in the last 2 years, none with AMD.