Hacker News new | ask | show | jobs
by fluxem 920 days ago
CUDA = C++ on GPUs. Compute shader - subset of C with a weird quirks.
2 comments

There are existing efforts to compile SYCL to Vulkan compute shaders. Plenty of "weird quirks" involved since they're based on different underlying varieties of SPIR-V ("kernels" vs. "shaders") and seem to have evolved independently in other ways (Vulkan does not have the amount of support for numerical computation that OpenCL/SYCL has) - but nothing too terrible or anything that couldn't be addressed by future Vulkan extensions.
A subset that lacks pointers, which makes compute shaders a toy language next to CUDA.
Vulkan 1.3 has pointers, thanks to buffer device address[1]. It took a while to get there, and earlier pointer support was flawed. I also don't know of any major applications that use this.

Modern Vulkan is looking pretty good now. Cooperative matrix multiplication has also landed (as a widely supported extension), and I think it's fair to say it's gone past OpenCL.

Whether we get significant adoption of all this I think is too early to say, but I think it's a plausible foundation for real stuff. It's no longer just a toy.

[1] https://community.arm.com/arm-community-blogs/b/graphics-gam...

Is IREE the main runtime doing Vulkan or are there others? Who should we be listening to (oh wise @raphlinus)?

It's been awesome seeing folks like Keras 3.0 kicking out broad Intercompatibility across JAX, TF, Pytorch, powered by flexible executuon engines. Looking forward to seeing more Vulkan based runs getting socialized benchmarked & compared. https://news.ycombinator.com/item?id=38446353

The two I know of are IREE and Kompute[1]. I'm not sure how much momentum the latter has, I don't see it referenced much. There's also a growing body of work that uses Vulkan indirectly through WebGPU. This is currently lagging in performance due to lack of subgroups and cooperative matrix mult, but I see that gap closing. There I think wonnx[2] has the most momentum, but I am aware of other efforts.

[1]: https://kompute.cc/

[2]: https://github.com/webonnx/wonnx

How feasible would it be to target Vulkan 1.3 or such from standard SYCL (as first seen in Sylkan, for earlier Vulkan Compute)? Is it still lacking the numerical properties for some math functions that OpenCL and SYCL seem to expect?
That's a really good question. I don't know enough about SYCL to be able to tell you the answer, but I've heard rumblings that it may be the thing to watch. I think there may be some other limitations, for example SYCL 2020 depends on unified shared memory, and that is definitely not something you can depend on in compute shader land (in some cases you can get some of it, for example with resizable BAR, but it depends).

In researching this answer, I came across a really interesting thread[1] on diagnosing performance problems with USM in SYCL (running on AMD HIP in this case). It's a good tour of why this is hard, and why for the vast majority of users it's far better to just use CUDA and not have to deal with any of this bullshit - things pretty much just work.

When targeting compute shaders, you pretty much have to manage buffers manually, and also do copying between host and device memory explicitly (when needed - on hardware such as Apple Silicon, you prefer to not copy). I personally don't have a problem with this, as I like things being explicit, but it is definitely one of the ergonomic advantages of modern CUDA, and one of the reasons why fully automated conversion to other runtimes is not going to work well.

[1]: https://stackoverflow.com/questions/76700305/4000-performanc...

> Vulkan 1.3 has pointers, thanks to buffer device address[1].

> [1] https://community.arm.com/arm-community-blogs/b/graphics-gam...

"Using a pointer in a shader - In Vulkan GLSL, there is the GL_EXT_buffer_reference extension "

That extension is utter garbage. I tried it. It was the last thing I tried before giving up on GLSL/Vulkan and switching to CUDA. It was the nail in the coffin that made me go "okay, if that's the best Vulkan can do, then I need to switch to CUDA". It's incredibly cumbersome, confusing and verbose.

What's needed are regular, simple, C-like pointers.