| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kmeisthax 612 days ago
	> I think some hardware vendors just release the compute units without shipping proper support yet This is Nvidia's moat. Everything has optimized kernels for CUDA, and maybe Apple Accelerate (which is the only way to touch the CPU matrix unit before M4, and the NPU at all). If you want to use anything else, either prepare to upstream patches in your ML framework of choice or prepare to write your own training and inference code.

1 comments

noduerme 611 days ago

I'm not sure why this is a moat. Isn't it just a matter of translation from CUDA to some other instruction set? If AMD or someone else makes cheaper hardware that does the same thing, it doesn't seem like a stretch for them to release a PyTorch patch or whatever.

link

david-gpu 611 days ago

Most of the computations are done inside NVidia proprietary libraries, not open-source CUDA. And if you saw what goes inside those libraries, I think you would agree that it is a substantial moat.

link

theGnuMe 611 days ago

There are clean room approaches like AMDs and Scale.

link

caeril 611 days ago

Geohot has multiple (and ongoing) rants about the sheer instability of AMD RDNA3 drivers. Lisa Su engaged directly with him on this, and she didn't seem to give a shit about their problems.

AMD is not taking ML applications seriously, outside of their marketing hype.

link

fvv 611 days ago

Rdna3 is not cdna

link

david-gpu 611 days ago

Are you suggesting that Scale can take cuDNN kernels and run them at anything resembling peak performance on AMD GPUs?

Because functional compatibility is hardly useful if the performance is not up to par, and cuDNN will run specific kernels that are particularly tuned to not only a specific model of GPU, but also to the specific inputs that the user is submitting. NVidia is doing a ton of work behind the scenes to both develop high-performance kernels for their exact architecture, but also to know which ones are best for a particular application.

This is probably the main reason why I was hesitant to join AMD a few years ago and to this day it seems like it was a good decision.

link

blharr 611 days ago

Sure you can probably translate rough code and get something that "works" but all the thousands of small optimizations that are baked in are not trivial to just translate.

link

noduerme 610 days ago

I like the take that small optimizations, taken together, amount to a moat. I feel like this could be a profoundly understated paradigm.

link