| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by digitallyfree 1109 days ago
	The issue with AMD and AI is, as always, the software stack. Even if the hardware is great ROCM simply doesn't have industry traction and accessiblity.

1 comments

wing-_-nuts 1109 days ago

Doesn't have the traction for now. Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware. Google has already invested in tpus and it wouldn't surprise me at all if they got together to fund ROCm development or even went so far as to develop their own NN asics.

Cuda is great, but it's not strictly necessary for much of the latest AI / ML developments.

link

zwaps 1109 days ago

ROCm is so terrible the cloud providers rolled out their own chips rather than use AMD which has perfectly good GPUs and the worst software stack ever.

link

mook 1109 days ago

ROCm still doesn't support consumer GPUs; that means people building random things (as opposed to more serious work things) won't be using their stack, so none of the innovation will be there.

It may be possible to use it with consumer GPUs anyway, but many won't try because it's not officially supported.

https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h... https://developer.nvidia.com/cuda-gpus

link

gremlinsinc 1109 days ago

Intel, AMD, Google, Amazon, etc should team up to create some sort of standards/consortium around an open source CUDA alternative, something that anyone who can fabricate chips could use, and the consortium could have their own team of devs/researchers to make improvements / next gen versions of their CUDA alternative.

Something like the way chrome vs chromium is, or even a foundation like the linux foundation, where you have multiple distros contributing packages/etc back into the ecosystem.

link

mepian 1109 days ago

They already did: https://www.khronos.org/sycl/

link

bfeynman 1109 days ago

they are already doing that with XLA, google has TPUs, amazon has tranium/inferentia. common interface in future that you basically just cast model to `.toDevice` of an enum of accelerated computing types seems to be the goal.

link

YetAnotherNick 1109 days ago

> Cloud providers (ms, google, amz, etc) are quickly tiring of paying Nvidia monopoly premiums for their gpu hardware.

I think cloud providers love exclusivity(Nvidia MSRP is significantly higher than it is available to clouds) and based on pricing compared to competitors like lambdalabs they have highest profit margin on GPU instances. Also based on availability, they likely have the highest utilisation. They definitely wouldn't want to commoditize the space. Google already has TPU that they could scale and sell to everyone but it would make the margins significantly smaller if they do it.

link