Hacker News new | ask | show | jobs
by roenxi 920 days ago
Gelsinger is saying "the entire industry" and that seems likely to be a simple fact. Every single player, other than Nvidia, has an incentive to minimise the importance of CUDA as a proprietary technology. That is a lot more programmers than Nvidia can afford to employ.

Even if Intel falls over its own feet, the incentives to bring in more chip manufacturers are huge. It'll happen, the only question is whether the timeframe is months, years or a decade. My guess is shorter timeframes, this seems to mostly be matrix multiplication and there is suddenly a lot of money and attention on the matter. And AMD's APU play [0] is starting to reach the high end of the market with the MI300A which is an interesting development.

[0] EDIT: For anyone not following that story, they've been unifying system and GPU memory; so if I've understood this correctly there isn't any need to "copy data to the GPU" any more on those chips. Basically the CPU will now have big extensions for doing matrix math. Seems likely to catch on. Historically they've been adding that tech to low-end CPU so it isn't useful for AI work, now they're adding it to the big ones.

3 comments

> That is a lot more programmers than Nvidia can afford to employ.

How many programmers one can employ is determined by profits, and Nvidia has monopoly profits thanks to CUDA, while "the entire industry" can at best hope to create some commiditized alternative to CUDA. Companies with real market power can beat entire industries of commodity manufacturers, Apple is the prime example.

AMD and Intel together have more revenue than Nvidia, even without considering any other player in the industry or any community contributions they get from being open source.
it's not about revenue, it's about investment. It is closely related to future profit. Not so much to current revenue ...
Profit is revenue minus costs. Investment is costs. If you're reinvesting everything you take in your current-year profit would be zero because you're making large investments in the future.
How many programmers do you really need though to catch up to what CUDA has already? The path has been laid. There's no need for experimentation. Just copy what NVIDIA did. No?
I'm not an expert here, but with:

> That is a lot more programmers than Nvidia can afford to employ

How do you account for the increased complexity those developers have to deal with in an environment where there are multiple companies with conflicting incentives working on the standard?

My gut reaction is to worry if this is one of those problems like "9 people working together can't have a baby in one month".

I actually find that a really interesting question with a really interesting answer - the scaling properties of large groups of people are unintuitive. In this case, my guess would be high market complexity, and the entire userbase to ignore that complexity in favour of 1-2 vendors with simple and cheap options. So the market overall will just settle on de-facto standards.

Of course, based on what we see right now that standard would be Nvidia's CUDA; but while CUDA is impressive I don't think running neural nets requires that level of complexity. We're not talking about GUIs which are one of the stickiest and most complicated blocks of software we know about, or complex platform-specific operations. I'd expect that the need for specialist libraries to do inference to go away in time and CUDA to be mainly useful for researching GPU applications to new problems. Training will likely just come down to raw ops/second in hardware rather than software.

It isn't like this stuff can't already run on other cards. AMD cards can run stable diffusion or LLMs. The issue is just that AMD drivers tend to crash. That is simultaneously a huge and a tiny problem - if they focus on it it won't be around for long. CUDA is an advantage, but not a moat.

> Gelsinger is saying "the entire industry" and that seems likely to be a simple fact. Every single player, other than Nvidia, has an incentive to minimise the importance of CUDA as a proprietary technology. That is a lot more programmers than Nvidia can afford to employ.

I mean, this statement is technically true, but it's true for any proprietary technology. If things work like this then we won't have any industry where proprietary techs/formats are prevalent.

I suppose, but it is a practical matter here. CUDA is a library for memory management and matrix math targeted at researchers, hyper-productive devs and enthusiasts. It looks like it'll be highly capital intensive, requiring hardware that runs in some of the biggest, nastiest, OSS-friendliest data-centres in the world who all design their own silicon. The generations of AMD GPU that matter - the ones out and on people's machines - aren't supported for high quality GPGPU compute right now. Alright, that means CUDA is a massive edge right now. But that doesn't look like a defensible moat.

I was interested in being part of this AI thing, what stopped me wasn't lack of CUDA, it was that my AMD card reliably crashes under load doing compute workloads. Then when I see George Hotz having a go, the problem isn't lack of CUDA; it was that his AMD card crashed under compute workloads (technically I think it was running the demo suite). That is only anecdata, but 2 for 2 is almost a significant number of people with the small number of players and lack of big money in AI historically.

Lacking CUDA specifically might be a problem here, but I've never seen AMD fall down at that point. I've only ever see them fall down at basic driver bugs. And I don't see how CUDA would matter all that much because I can implement most of what I need math-wise in code. If I see a specific list of common complaints maybe I'll change my mind, but I'm just not detecting where the huge complexity is. I can see CUDA maintaining an edge for years because it is convenient, but I really don't see how it can stay essential. The card can already do the workload in theory and in practice assuming the code path doesn't bug out. I really don't need CUDA, all I want rocBLAS to not crash. I suspect that'd go a long way in practice.

AMD could use testers(cough clients i mean) like you. Jokes aside, please report bugs to rocm github..
Unless their hardware is on the official support list, I wouldn't be too hopeful for a quick resolution. Still, it's even less likely to get fixed if it's not reported.

If nothing else, I would be curious to know more about the issue. Personally, I want to know how well ROCm functions on every AMD GPU.