Hacker News new | ask | show | jobs
by alephnerd 422 days ago
Services! Services services services!

This is what will help protect Nvidia now that DC and cluster spend is cooling.

They own the ecosystem thanks to CUDA, Infiniband, NGC, NVLink, and other key tools. Now they should add additional applications (the AI Foundry is a good way to do that), or forays into adjacent spaces like white-labeled cluster management.

Working on building custom designs and consulting on custom GPU projects would be helpful as well by helping monetize their existing design practice during slower markets.

Of course, Nvidia is starting to do both, with Nvidia AI Foundry for the former and is working on the latter by starting a GPU architecture and design consulting as announced at GTC and under McKinney

3 comments

> They own the ecosystem thanks to CUDA, Infiniband, NGC, NVLink,

No they do not. The article explains that Google, Amazon, Microsoft, and Meta are developing their own hardware and software for AI/HPC.

Google Gemini was not trained using CUDA or Nvidia hardware.

Only one of the 4 companies you mention is successful at this. And it will remain that way.

Chinese CSPs are the only ones can develop their own hardware / software for AI / HPC.

Of course corporations will have a lot of different bets. Most of them will not pan out but they will try.

Meta will not be able to produce a chip that can run GenAI workload in the next 2 years.

Microsoft is doing a side-quest, and they haven't even proved themselves with their FPGA adventure and ARM server adventure.

Amazon is legit, they have done well on ARM server, but trainium is TBD, and how much they will pull back in a recession given Jassy is a number guy will be a question mark.

No need to discuss, we can just see this in 2 years, everything will be crystal clear.

Yeah that's right. The chip may not be as good as Nvidia's, but it doesn't need to be. As the article explains, Nvidia can still lose their position even if they have the best chips.
They have to be competitive. TPUs are wildly ahead of the pack. And even they aren't particularly competitive. 12 years of ecosystem development by the most advanced AI ecosystem company on the planet and your (ex-Google!) researchers are still going to pelt you with tomatoes if you tell them you are swapping out their H100 cluster with TPUs. JAX remains niche (not saying bad) and extremely hard to use efficiently without the help of Google (no CUDA for going off the beaten path).

I suspect the closed nature of the ecosystem will preclude them from winning as much as they could.

> Working on building custom designs and consulting on custom GPU projects would be helpful as well by helping monetize their existing design practice during slower markets.

Apart from Nintendo, who has successfully partnered with Nvidia? Apple, Microsoft and Sony have all been burnt in the past.

National Labs (kinda), a big pharma company I don't think I can disclose, and a couple HFTs, but it's a muscle they will need to build out, because Broadcom are Marvell are eating their cake.

Nvidia has started formalizing that last year [0], but it's a new muscle for them.

[0] - https://www.reuters.com/technology/nvidia-chases-30-billion-...

Watch the Nvidia GTC keynote. The list of partners is extensive.
Being a partner on the GTC slidedecks isn't remotely good evidence that they haven't been burned by Nvidia.
Yeah, that immediately came to mind—they talk about distributed systems being a problem, but Nvidia owns the battle-tested and well-regarded HPC networking hardware (Infiniband).

There’s maybe some wiggle room, in that these AI distributed systems might not (?) look like HPC/scientific computing systems—maybe they don’t need Infiniband style low latency. So these other funky networks might work.

But like, Nvidia has the good nodes and the good network. That’s a rough combination to compete against.