| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ChrisRackauckas 2300 days ago
	You could just use LoopVectorization on the CPU side. It's been shown to match well-tuned C++ BLAS implementations, for example with the pure Julia Gaius.jl (https://github.com/MasonProtter/Gaius.jl), so you can follow that as an example for getting BLAS-speed CPU side kernels. For the GPU side, there's CUDAnative.jl and KernelAbstractions.jl, and indeed benchmarks from NVIDIA show that it at least rivals directly writing CUDA (https://devblogs.nvidia.com/gpu-computing-julia-programming-...), so you won't be missing anything just by learning Julia and sticking to using just Julia for researching new kernel implementations.

1 comments

p1esk 2299 days ago

In that benchmark, was Julia tested against CuDNN accelerated neural network CUDA code? If not, is it possible (and beneficial) to call CuDNN functions from Julia?

link

ChrisRackauckas 2298 days ago

That wasn't a benchmark with CuDNN since it was a benchmark about writing such kernels. However, Julia libraries call into optimized kernels whenever they exist, and things like NNLib.jl (the backbone of Flux.jl) and KNet.jl expose operations like `conv` that dispatch CuArrays to automatically use CuDNN.

link