Everytime I'm tempted to think software is easy compared to hardware, I just remember that AMD is leaving about a trillion dollars worth of market cap on the table, because they haven't figured out a good alternative to CUDA.
Fred Brooks wrote in The Mythical Man-Month that it's harder (more time-consuming) to produce the software that corresponds to a given hardware. In 1975.
Hardware was much simpler and less complex then than now. I wonder how or if that's changed by going from hundreds or thousands of transistors to billions.
They’ll need to either reverse engineer CUDA or incentivize reimplementation of everything out there to use ROCm/OpenCL and forgo all the work load optimization done for Nvidia GPUs. I think that’s a non trivial moat.