Hacker News new | ask | show | jobs
by roenxi 920 days ago
I suppose, but it is a practical matter here. CUDA is a library for memory management and matrix math targeted at researchers, hyper-productive devs and enthusiasts. It looks like it'll be highly capital intensive, requiring hardware that runs in some of the biggest, nastiest, OSS-friendliest data-centres in the world who all design their own silicon. The generations of AMD GPU that matter - the ones out and on people's machines - aren't supported for high quality GPGPU compute right now. Alright, that means CUDA is a massive edge right now. But that doesn't look like a defensible moat.

I was interested in being part of this AI thing, what stopped me wasn't lack of CUDA, it was that my AMD card reliably crashes under load doing compute workloads. Then when I see George Hotz having a go, the problem isn't lack of CUDA; it was that his AMD card crashed under compute workloads (technically I think it was running the demo suite). That is only anecdata, but 2 for 2 is almost a significant number of people with the small number of players and lack of big money in AI historically.

Lacking CUDA specifically might be a problem here, but I've never seen AMD fall down at that point. I've only ever see them fall down at basic driver bugs. And I don't see how CUDA would matter all that much because I can implement most of what I need math-wise in code. If I see a specific list of common complaints maybe I'll change my mind, but I'm just not detecting where the huge complexity is. I can see CUDA maintaining an edge for years because it is convenient, but I really don't see how it can stay essential. The card can already do the workload in theory and in practice assuming the code path doesn't bug out. I really don't need CUDA, all I want rocBLAS to not crash. I suspect that'd go a long way in practice.

1 comments

AMD could use testers(cough clients i mean) like you. Jokes aside, please report bugs to rocm github..
Unless their hardware is on the official support list, I wouldn't be too hopeful for a quick resolution. Still, it's even less likely to get fixed if it's not reported.

If nothing else, I would be curious to know more about the issue. Personally, I want to know how well ROCm functions on every AMD GPU.