If you compare the effort that AMD spends on OpenCL compared to what NVidia spends on CUDA then you'll see why everyone just used NVidia.
I'm not a big fan of vendor "standards", but I have very limited sympathy for OpenCL here.
I think the best hope for portability is at the higher level programming API layer. For example TensorFlow is careful to make switching between CPU and GPU painless.
It would not have to be like that if Nvidia opened up the source-code for cuFFT/cuDNN/cuBLAS. My guess is they are not doing that because it is fairly trivial to port code from CUDA to OpenCL. It can even be automated.
Unfortunately, that ship sailed years ago. Also, NVIDIA is heavily investing in deep learning (cudnn), so this won't happen unless someone in OpenCL builds something equally/more performant.
That could work, but Google doesn't seem to have a stake in OpenCL (so it's not a priority). Nvidia has a stake in CUDA, so I wouldn't be surprised if it's a top business priority there.
I'm not a big fan of vendor "standards", but I have very limited sympathy for OpenCL here.
I think the best hope for portability is at the higher level programming API layer. For example TensorFlow is careful to make switching between CPU and GPU painless.