| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by georgeecollins 473 days ago
	I would really love it if people on Hacker News could weigh in on how much of a moat they think CUDA really is. As in: How hard is it to use something else? If you started a project today how much would you want to get paid to not use CUDA? A lot of readers on this site have a good insight into this and it is a key question financial people are asking without the knowledge many people here possess.

6 comments

czk 473 days ago

SemiAnalysis has a nice write-up on MI300X vs H100/H200 and concludes that the CUDA moat is still very real: https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b...

"As fast as AMD tries to fill in the CUDA moat, NVIDIA engineers are working overtime to deepen said moat with new features, libraries, and performance updates."

link

BitwiseFool 473 days ago

AMD's competitor to CUDA is ROCm. Historically, AMD has been hobbled by the quality of their drivers and because they sold less performant hardware. AMD has traditionally been the budget option for both CPUs and GPUs. Things have changed in the CPU space because of Ryzen, but sadly AMD has not been able to realize an equivalent competitive advantage in the GPU space. Intel has also entered the GPU market, but they are even farther behind than AMD. The same problems I am about to describe apply to them as well, to a higher degree.

Rewriting CUDA programs to run using ROCm is expensive and time consuming. It is difficult to justify this expense when in all likelihood the ROCm version will be less efficient, less performant, and less stable than the original. In the grand scheme of things, AMD hardware is indeed cheaper but it's not that much cheaper. From a business standpoint, it's just not worth it.

Knowing what I know about how management thinks, even if AMD managed to make an objectively superior product at a much better price, institutional momentum alone would keep people on CUDA for a long time.

link

JohnBooty 472 days ago

    AMD has been hobbled by the quality of their drivers

I always hear this and I believe it, but I've never been able to find any insight about what exactly is holding them back.

Given the way nVidia is printing money, surely it absolutely cannot be a lack of motivation on AMD's part?

This is a very uninformed thought as I have no experience writing drivers, nor am I familiar with the various things supported by CUDA and ROCm. But how is AMD struggling with ROCm compute drivers, when their game drivers have been plenty stable as far as I have experienced? Surely the surface area of functionality needed for the graphics drivers is larger and therefore the compute drivers should be a relatively easier task? Or am I wrong and CUDA has a bunch of higher-level stuff baked into it and this is what AMD struggles to match?

     and because they sold less performant hardware.

Does anybody have and insight into specifically what part of compute performance AMD is struggling to match? Did AMD bet on the wrong architectural horse entirely? Are they unable to implement really basic compute primitives as efficiently as they want because nVidia holds key patents? Did nVidia lock down the entire pool of engineers who can implement this shit in a performant way?

I mean, aside from GPU compute stuff, it sure looks to me like AMD is executing well. It doesn't seem like they're a bunch of dunces over there. Quite the opposite?

link

czk 473 days ago

Never underestimate the power of institutional momentum! cough IBM AS400

link

jononor 473 days ago

One aspect that influences is how close to the bleeding edge one needs to be. And how niche the model/application is. ROCm lags by some years. And application/model/framework developers test less on it, which can be problematic in niches. For doing something very established like say image classification, that does not really matter - 3 year old CNNs will generally do the trick. But if on wants to drop in model X just put on GitHub/HuggingFace the last year, one would be buying a lot of trouble.

link

ChocolateGod 473 days ago

> could weigh in on how much of a moat they think CUDA really is.

There's movement to implement CUDA libraries that work on non-Nvidia cards, but I guess adoption could be hindered by legal fears.

https://github.com/vosen/ZLUDA

link

latchkey 472 days ago

Wrong project. ZLUDA will never support enterprise.

What you're looking for is SCALE...

https://docs.scale-lang.com/

and they are making amazing progress.

link

r1chardnl 473 days ago

Whenever a new AI model gets released and is available for the public. From the last few I've tried they were always NVIDIA only because I assume that's what the researchers had at their disposal.

link

ljlolel 473 days ago

So why give away the valuable knowledge away for free?

link

ofrzeta 473 days ago

It's the hacker philosophy, isn't it?

link