Currently we support both CUDA and CPU to some extent. CPU is done through standard C++ (and soon stdpar). Obviously standard C++ is problematic since it doesn't include everything we support (FFTs, matrix multiplies, etc). One option is to use open-source libraries that do these, but then it ends up being a lot of dependencies that are hard to manage. We have plans to improve CPU support soon, so stay tuned.
I don't actually know a lot about massively parallel libraries like CUDA. Does AMD have an equivalent technology associate with their GPUs? It feels like it should be fairly straightforward to create some kind of high level library that just uses CUDA or whatever AMD has on the back end.
GPU performance per dollar is only competitive for specific workloads. For extremely large scale compute, getting enough data center GPUs can also be challenging.