The difference is not just about APIs; CUDA has a single source file model that is dead easy to use whereas last I checked every competitor still had an outdated manual loading process that adds significant friction.
It is supposed to, yes. I was never able to set it up (admittedly I have not tried in a couple of years since I am not working with GPUs anymore) so I don't know how well it holds up.