If you're advanced library writer, you can leverage: https://juliagpu.github.io/KernelAbstractions.jl/stable/#Wri... which allows you to write kernel, in Julia, that compiles efficiently with rest of native Julia code, and that works cross-vendor!
Back to multi GPU, it seems there's: https://clima.github.io/OceananigansDocumentation/stable/app... which is MPI based?