The CUDA moat is extremely exaggerated for deep learning, especially for inference. It’s simply not hard to do matrix multiplication and a few activation functions here and there.
It regularly shocks me that AMD doesn't release their cards with at least enough CUDA reimplementation to run DL models. As you point out, AI applications use a tiny subset of the overall API, the courts have ruled that APIs can't be protected by copyright, and CUDA is NVIDIA's largest advantage. It seems like an easy win, so I assume there's some good reason.
A very cynical take: AMD and Nvidia CEO’s are cousins and there’s more money to be made with one dominant monopoly than two competitive companies. And this income could be an existential difference-maker for Taiwan.
AMD can't even figure out how to release decent drivers for Linux in a timely fashion. It might not be the largest market, but would have at least given them a competitive advantage in reaching some developers. There is either something very incompetent in their software team, or there are business reasons intentionally restraining them.
From what I've been reading the inference workload tends to ebb and flow throughout the day with much lower loads overnight than at for example 10AM PT/1PM ET. I understand companies fill that gap with training (because an idle GPU costs the most).
So for data centers, training is just as important as inference.
> So for data centers, training is just as important as inference.
Sure, and I’m not saying buying Nvidia is a bad bet. It’s the most flexible and mature hardware out there, and the huge installed base also means you know future innovations will align with this hardware. But it’s not primarily a CUDA thing or even a software thing. The Nvidia moat is much broader than just CUDA.