Julia does not just have bindings to CUDA. Native Julia code can compile to build .ptx kernels https://cuda.juliagpu.org/stable/development/kernel/. This same code can also generate kernels for AMD GPUs, Intel GPUs, and Metal.
We for example built software that generates kernels on-demand that embed user functions for all 4 of these systems and showed it's much faster than just CUDA bindings for array functions for certain nonlinear systems (https://www.sciencedirect.com/science/article/abs/pii/S00457...)