Hacker News new | ask | show | jobs
by Flenkno 540 days ago
Nvidia has a unique problem, wants to move fast and has a shit load of money.

No need for Nvidia to go first to an industry standard and neither for AMD.

Personally would be great its getting backported but its so far away from an normal use case.

1 comments

Nvidia is the one who went to an industry standard before their competitors in this space. It was created in 1999 and is called infiniband:

https://en.wikipedia.org/wiki/InfiniBand

Infiniband is extremely popular in the HPC space, which is why Nvidia adopted it. Everyone else saw Nvidia adopt it and said "Let us make a new network standard to be incompatible". This is mind boggling.

Even more mind boggling is that many of the companies in the Ultra Ethernet Consortium are members of the Infiniband Trade Association, AMD included:

https://www.infinibandta.org/member-listing/

This would be like the automotive industry forming a consortium to invent new incompatible wheels to exclude a successful upstart that adopted their existing standard wheel designs. With trillions of dollars in revenue on the line, you would think that companies would use existing networking standards to focus on building competitive hardware with reduced time to market, yet they are instead reinventing networking standards just because they can. This is a huge gift to Nvidia, since it means that everyone else is wasting time and money instead of being competitive.

Sorry to hound you for the third time, but this is wrong:

> Infiniband is extremely popular in the HPC space

Not anymore. There used to be Cray Aries/GNI, psm/psm2, and now there's Slingshot, the new Cornelis stuff etc. There's almost no Infiniband now.

The top500 says otherwise:

https://www.infinibandta.org/infiniband-and-roce-advances-fu...

Where are you getting your information?

I concede that the specifics of what I said were wrong, but the larger point was not.

If you buy a single DGX H100 rack and run LINPACK, you automatically get TOP500-grade numbers. Infiniband is a solid product, if not the best commercial offering for AI/ML, but no one buys it for an HPC cluster separately from the DGX boxes.

#26 on the list uses AMD GPUs with infiniband:

https://www.top500.org/system/180171/

You can likely find more. Infiniband has been excellent for HPC since the 2000s. That includes all HPC workloads, not just AI/ML.

Excuse me if I do not believe your claims concerning infiniband. They contradict not only actual data, but also what I have heard from people I consider experts.

Also, you did not answer my question concerning the origin of your information. I notice from another comment if yours that you have been talking to a LLM about this conversation. Have you been posting things that a LLM tells you?