CUDA itself does only one thing better - it lets you write kernels in C++ rather than a fairly restrictive subset of C. Far more importantly, it has a large ecosystem of tooling (including excellent profiling tools), libraries, documentation and open source projects out there. Comparatively speaking, OpenCL is a barren landscape.
That it does and there's also plenty of convenience wrappers of various kinds for different languages. Unfortunately, none of that can address the ecosystem issue, which is the one that really matters.
Cuda has
Cooperative Groups now on Volta and Turing architectures. This allows for synchronization between entire workgroups rather than just locally. So you can pretty much keep your entire job on the GPU even if it involves multiple kernels. Really important for complex jobs where performance is a must.
Given that AMDs approach has shifted to implementing cuda (under the name "hip") and providing tools to automatically find/replace cuda to hip, I don't think the cuda api is going anywhere.
Why did I say partly? So many optimization layers exist already (e.g. BLAS/cuBLAS). One may not really need to get down to the CUDA level.