Hacker News new | ask | show | jobs
by amkkma 2096 days ago
The GPU gap is only if written in the high level index or loop style. There is little to no gap if done either using array abstractions (broadcast, map etc) or at a level similar to Cuda C (though with nicer Julia abstractions and syntax): https://juliagpu.org/cuda/

The Julialab at MIT is working on making the higher level codegen faster

1 comments

I guess that makes sense to me.. you can just automatically convert the C in BLAS to Julia and then if they're both being converted to llvm ir by clang anyways than i guess it'll be about as fast!
That's not at all what Julia is doing. It's much more sophisticated in that it has very low level intrinsic primitives that can compose and it optimizes the IR to make it fast and then compiles it to CUDA. These all map to Julia constructs.