Disclaimer: I work on AMD ROCm, but my opinions are my own.
There's also HIP[1], which can be used as a thin wrapper around CUDA, or with the ROCm backend on AMD platforms. It doesn't yet match CUDA in either breadth of features or maturity, but it's getting closer every day.
As I understand it, that has to work for the CORAL 2 US "exascale", so people who've been proved fairly right so far obviously have some confidence in it. (de Supinksi of Livermore said he'd be out of a job if conventional wisdom was right, though it was pretty obvious at the time that it wasn't.)
Free software too, praise be.
CUDA seems nice, but being Nvidia only makes it a total dead end.