Completely agree. It's been 18 years since Nvidia released CUDA. AMD has had a long time to figure this out so I'm amazed at how they continue to fumble this.
AMD's software investments have begun in earnest a few years ago, but AMD really did progress more than pretty much everyone else aside from NVidia IMO.
AMD further made a few bad decisions where they "split the bet", relying upon Microsoft and others to push software forward. (I did like C++ Amp for what its worth). The underpinnings of C++Amp led to Boltzmann which led to ROCm, which then needed to be ported away from C++Amp and into CUDA-like Hip.
So its a bit of a misstep there for sure. But its not like AMD has been dilly dallying. And for what its worth, I would have personally preferred C++ Amp (a C++11 standardized way to represent GPU functions as []-lambdas rather than CUDA-specific <<<extensions>>>). Obviously everyone else disagrees with me but there's some elegance to parallel_for_each([](param1, param2){magically a GPU function executing in parallel}), where the compiler figures out the details of how to get param1 and param2 from CPU RAM into GPU (or you use GPU-specific allocators to make param1/param2 in the GPU codespace already to bypass the automagic).
CUDA of 18 years ago is very different to CUDA of today.
Back then AMD/ATI were actually at the forefront on the GPGPU side - things like the early brook language and CTM lead pretty quickly into things like OpenCL. Lots of work went on using the xbox360 gpu in real games for GPGPU tasks.
But CUDA steadily improved iteratively, and AMD kinda just... stopped developing their equivalents? Considering a good part of that time they were near bankruptcy it might have not have been surprising though.
But saying Nvidia solely kicked off everything with CUDA is rather a-historical.
AMD kinda just... stopped developing their equivalents?
I wasn't so much that they stopped developing, rather they kept throwing everything out and coming out with new and non backwards compatible replacements. I knew people working in the GPU Compute field back in those days who were trying to support both AMD/ATI and NVidia. While their CUDA code just worked from release to release and every new release of CUDA just got better and better, AMD kept coming up with new breaking APIs and forcing rewrite and rewrite until they just gave up and dropped AMD.
Yep! I used BrookGPU for my GPGPU master thesis, before CUDA was a thing.
AMD lacked followthrough on yhe software side as you said, but a big factor was also NV handing out GPUs to researchers.
10 years ago they were basically broke and bet the farm on Zen. That bet paid off. I doubt a bet on CUDA would have paid off in time to save the company. They definitely didn't have the resources to split that bet.
AMD's software investments have begun in earnest a few years ago, but AMD really did progress more than pretty much everyone else aside from NVidia IMO.
AMD further made a few bad decisions where they "split the bet", relying upon Microsoft and others to push software forward. (I did like C++ Amp for what its worth). The underpinnings of C++Amp led to Boltzmann which led to ROCm, which then needed to be ported away from C++Amp and into CUDA-like Hip.
So its a bit of a misstep there for sure. But its not like AMD has been dilly dallying. And for what its worth, I would have personally preferred C++ Amp (a C++11 standardized way to represent GPU functions as []-lambdas rather than CUDA-specific <<<extensions>>>). Obviously everyone else disagrees with me but there's some elegance to parallel_for_each([](param1, param2){magically a GPU function executing in parallel}), where the compiler figures out the details of how to get param1 and param2 from CPU RAM into GPU (or you use GPU-specific allocators to make param1/param2 in the GPU codespace already to bypass the automagic).