Hacker News new | ask | show | jobs
by digitallyfree 1109 days ago
The issue with AMD and AI is, as always, the software stack. Even if the hardware is great ROCM simply doesn't have industry traction and accessiblity.
1 comments

Doesn't have the traction for now. Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware. Google has already invested in tpus and it wouldn't surprise me at all if they got together to fund ROCm development or even went so far as to develop their own NN asics.

Cuda is great, but it's not strictly necessary for much of the latest AI / ML developments.

ROCm is so terrible the cloud providers rolled out their own chips rather than use AMD which has perfectly good GPUs and the worst software stack ever.
ROCm still doesn't support consumer GPUs; that means people building random things (as opposed to more serious work things) won't be using their stack, so none of the innovation will be there.

It may be possible to use it with consumer GPUs anyway, but many won't try because it's not officially supported.

https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h... https://developer.nvidia.com/cuda-gpus

Intel, AMD, Google, Amazon, etc should team up to create some sort of standards/consortium around an open source CUDA alternative, something that anyone who can fabricate chips could use, and the consortium could have their own team of devs/researchers to make improvements / next gen versions of their CUDA alternative.

Something like the way chrome vs chromium is, or even a foundation like the linux foundation, where you have multiple distros contributing packages/etc back into the ecosystem.

they are already doing that with XLA, google has TPUs, amazon has tranium/inferentia. common interface in future that you basically just cast model to `.toDevice` of an enum of accelerated computing types seems to be the goal.
> Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware.

I think cloud providers love exclusivity(Nvidia MSRP is significantly higher than it is available to clouds) and based on pricing compared to competitors like lambdalabs they have highest profit margin on GPU instances. Also based on availability, they likely have the highest utilisation. They definitely wouldn't want to commoditize the space. Google already has TPU that they could scale and sell to everyone but it would make the margins significantly smaller if they do it.