Hacker News new | ask | show | jobs
by Minks 373 days ago
ROCm really is hit or miss depending on the use case.

Plus their consumer card support is questionable to say the least. I really wish it was a viable alternative, but swapping to CUDA really saved me some headaches and a ton or time.

Having to run MiOpen benchmarks for HIP can take forever.

1 comments

Exactly the same has been said over and over again, ever since CUDA took off for scientific computing around 2010. I don’t really understand why 15 years later AMD still hasn’t been able to copy the recipy, and frankly it may be too late now with all that mindshare in NVIDIA’s software stack.
Just remember that 4 of the top 10 Top500 systems run on AMD Instinct cards, based on the latest June 2025 list announced at ISC Hamburg.

NVIDIA has a moat for smaller systems, but that is not true for clusters.

As long as you have a team to work with the hardware you have, performance beats mindshare.

The Top500 is an irrelevant comparison; of course AMD is going to give direct support to single institutions that give them hundreds of millions of dollars and help make their products work acceptably. They would be dead if they didn't. Nvidia also does the same thing to their major clients, and yet they still make their products actually work day 1 on consumer products, too.

Nvidia of course has a shitload more money, and they've been doing this for longer, but that's just life.

> smaller systems

El Capitan is estimated to cost around $700 million or something with like 50k deployed MI300 GPUs. xAI's Colossus cluster alone is estimated to be north of $2 billion with over 100k GPUs, and that's one of ~dozens of deployed clusters Nvidia has developed in the past 5 years. AI is a vastly bigger market in every dimension, from profits to deployments.

HPC has probably been holding AMD back from the much larger AI market.
Custom builds with top paid employees to make the customer happy.
What do you mean?
Besides sibling comment, HPC labs are the kind of customers that get hardware companies to fly in engineers when there is a problem bringing down the compute cluster.
presumably that in HPC you can dump enough money into individual users to make the platform useful in a way that is impossible in a more horizontal market. in HPC it used to be fairly common to get one of only 5 machines with processor architecture that had never existed before, dump a bunch of energy into making it work for you, and then throw it all out in 6 years.
It's just not easy. Even if AMD was willing to invest in the required software, they would need a competitive GPU architecture to make the most of it. It's a lot easier to split 'cheap raster' and 'cheap inference' into two products, despite Nvidia's success.
Well, AMD is supposed to be releasing UDNA next year, which will presumably ‘unite’ capabilities like raster and inference within one architecture.