Hacker News new | ask | show | jobs
by DebtDeflation 106 days ago
> Nvidia's "rack scale" machines like GB200-NVL72s and GB300-NVL72s are basically a fully built rack you roll into a DC and plug into power and network. In that case, Oracle should probably just buy the rack-scale Vera Rubins when they come out instead of Blackwells and roll them into their new DCs.

This is what I don't understand. Why is the article making the assumption that the DC itself is tied to a particular GPU generation? AWS doesn't knock down a building and start over every time Intel releases a new Xeon.

2 comments

Xeons have a much longer shelf life and diverse workloads. If you order hardware specifically for LLM inference and then some new hardware/model combination is much better at that (which it will be, because a lot of people are working on that), you might be in trouble.

It's like setting up a warehouse of GPUs to mine bitcoin while others are switching to ASICs.

Training you mean. Doing inference on last year's chip is probably ok, but training a frontier model on it is going to be a deal breaker.
No I mean inference. The idea is that inference demand will be massive and a race to the bottom with razor thin margins.

Training costs can be amortized over the entire lifetime of the model, but if you lose money on inference or can't offer competitive usage limits for subscribers, there's no amortizing that.

No it's all about having the top model first and training time is what's crucial. OpenAI has already shown willingness to bleed money for the sake of brand and we can expect that to continue.
OpenAI economics don't really work unless you happen to be OpenAI.
Infiniband and coherent fabric.