Hacker News new | ask | show | jobs
by andrew_v4 1894 days ago
Just for contrast its interesting to look at an example of writing a similar kernel in Julia:

https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/

I don't think it's possible to achieve something like this in python because of how it's interpreted (but it sounds a bit like what another comment mentioned where the python was compiled to C)

4 comments

I think the contrast is probably less about the language, and more about the scope and objective of the projects. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...

here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...

I gave numba CUDA a spin in late 2018 and was severely disappointed. It didn't work out of the box, I had to tweak the source to remove a reference to an API that had been removed from CUDA more than a year prior (and deprecated long ago). Then I ran into a bug when converting a float array to a double array -- I had to declare the types three different times and it still did a naive byte-copy rather than a conversion. Thanks to a background in numerics, the symptoms were obvious, but yikes. The problem that finally did us in was an inability to get buffers to correctly pass between kernels without a CPU copy, which was absolutely critical for our perf. I think this was supported in theory but just didn't work.

In any case, we did a complete rewrite in CUDA proper in less time than we spent banging our heads against that last numba-CUDA issue.

Under every language bridge there are trolls and numba-CUDA had some mean ones. Hopefully things have gotten better but I'm definitely still inside the "once bitten twice shy" period.

Same here. I switched over to CuPy from numba.cuda
> Julia has first-class support for GPU programming

"First-class" is a steep claim. Does it support the nvidia perf tools? Those are very important for taking a kernel from (in my experience) ~20% theoretical perf to ~90% theoretical perf.

Yeah, see this section of the documentation: https://juliagpu.gitlab.io/CUDA.jl/development/profiling/. CUDA.jl also supports NVTX, wraps CUPTI, etc. The full extent of the APIs and tools is available.

Source line association when using PC sampling is currently broken due to a bug in the NVIDIA drivers though (segfaulting when parsing the PTX debug info emitted by LLVM), but I'm told that may be fixed in the next driver.

Nice! I set a reminder to check back in a month.
> CUDAnative.jl also [...] generates the necessary line number information for the NVIDIA Visual Profiler to work as expected

That sounds very promising, but these tools are usually magnificent screenshot fodder yet they are conspicuously absent from the screenshots so I still have suspicions. Maybe I'll give it a try tonight and report back.

Here's a screenshot: https://julialang.org/assets/blog/nvvp.png. Or a recent PR when you can see NVTX ranges from Julia: https://github.com/JuliaGPU/CUDA.jl/pull/760
Thanks! Now I believe! :)
JAX and TensorFlow functions both would convert some Python code to equivalent XLA code or a TF graph.
i mentioned this in the response to the other comment but straight compilation is exactly what numba does for CUDA support because, just like Julia, numba uses llvm as a middleend (and llvm has a ptx backend).