| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andrew_v4 1894 days ago

Just for contrast its interesting to look at an example of writing a similar kernel in Julia:

https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/

I don't think it's possible to achieve something like this in python because of how it's interpreted (but it sounds a bit like what another comment mentioned where the python was compiled to C)

4 comments

rrss 1894 days ago

I think the contrast is probably less about the language, and more about the scope and objective of the projects. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...

here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...

link

jjoonathan 1894 days ago

I gave numba CUDA a spin in late 2018 and was severely disappointed. It didn't work out of the box, I had to tweak the source to remove a reference to an API that had been removed from CUDA more than a year prior (and deprecated long ago). Then I ran into a bug when converting a float array to a double array -- I had to declare the types three different times and it still did a naive byte-copy rather than a conversion. Thanks to a background in numerics, the symptoms were obvious, but yikes. The problem that finally did us in was an inability to get buffers to correctly pass between kernels without a CPU copy, which was absolutely critical for our perf. I think this was supported in theory but just didn't work.

In any case, we did a complete rewrite in CUDA proper in less time than we spent banging our heads against that last numba-CUDA issue.

Under every language bridge there are trolls and numba-CUDA had some mean ones. Hopefully things have gotten better but I'm definitely still inside the "once bitten twice shy" period.

link

N1H1L 1893 days ago

Same here. I switched over to CuPy from numba.cuda

link

jjoonathan 1894 days ago

> Julia has first-class support for GPU programming

"First-class" is a steep claim. Does it support the nvidia perf tools? Those are very important for taking a kernel from (in my experience) ~20% theoretical perf to ~90% theoretical perf.

link

maleadt 1894 days ago

Yeah, see this section of the documentation: https://juliagpu.gitlab.io/CUDA.jl/development/profiling/. CUDA.jl also supports NVTX, wraps CUPTI, etc. The full extent of the APIs and tools is available.

Source line association when using PC sampling is currently broken due to a bug in the NVIDIA drivers though (segfaulting when parsing the PTX debug info emitted by LLVM), but I'm told that may be fixed in the next driver.

link

jjoonathan 1894 days ago

Nice! I set a reminder to check back in a month.

link

klmadfejno 1894 days ago

https://developer.nvidia.com/blog/gpu-computing-julia-progra...

link

jjoonathan 1894 days ago

> CUDAnative.jl also [...] generates the necessary line number information for the NVIDIA Visual Profiler to work as expected

That sounds very promising, but these tools are usually magnificent screenshot fodder yet they are conspicuously absent from the screenshots so I still have suspicions. Maybe I'll give it a try tonight and report back.

link

maleadt 1894 days ago

Here's a screenshot: https://julialang.org/assets/blog/nvvp.png. Or a recent PR when you can see NVTX ranges from Julia: https://github.com/JuliaGPU/CUDA.jl/pull/760

link

jjoonathan 1894 days ago

Thanks! Now I believe! :)

link

albertzeyer 1894 days ago

JAX and TensorFlow functions both would convert some Python code to equivalent XLA code or a TF graph.

link

anon_tor_12345 1894 days ago

i mentioned this in the response to the other comment but straight compilation is exactly what numba does for CUDA support because, just like Julia, numba uses llvm as a middleend (and llvm has a ptx backend).

link