Hacker News new | ask | show | jobs
by reissbaker 106 days ago
I run a small open source LLM inference company, Synthetic.new. As far as I can tell, CNBC isn't reporting this accurately: the problem isn't that Oracle is building "yesterday's data centers": they're building Blackwell DCs! Those are today's DCs.

The problem appears to be that Oracle is building today's DCs... Tomorrow. And by the time they come online, Vera Rubins will be out, with 5x efficiency gains. And Oracle is unlikely to want to drop the price of Blackwells 5x, despite them being 5x less efficient.

It's a little unclear to me how bad this is. Nvidia's "rack scale" machines like GB200-NVL72s and GB300-NVL72s are basically a fully built rack you roll into a DC and plug into power and network. In that case, Oracle should probably just buy the rack-scale Vera Rubins when they come out instead of Blackwells and roll them into their new DCs. Tada! Tomorrow's DCs, tomorrow.

OTOH it's possible someone at Oracle screwed up and committed to buying Blackwells at today's prices, delivered tomorrow. Or maybe construction of the physical DCs is behind schedule, so today's Blackwells are sitting around unused, waiting for power and networking tomorrow. Then they're in a bit of trouble.

Regardless, CNBC's reporting seems pretty unclear on what actually happened and whether this is actually bad or not.

19 comments

I really don't want to overrule your expertise in this regard, but an 5x efficiency gain in a single generation feels like its too much, especially considering how newer process nodes have been yielding less and less improvements.

Just to compare and contrast:

https://www.videocardbenchmark.net/power_performance.html

Here's a synthethic benchmark page listing every GPU in recent memory. True, its not AI, but if we look at the 1080 Ti, a 9 year old card at this point, and compare it with the 5090 we see the gains were 190/74=2.56x in that timespan that involved multiple die shrinks and uArch changes.

I think these numbers might not hold up on IRL workloads, and afaict older datacenter cards still hold up well and are being used in production.

Newer process nodes are not the main avenue of improvement. What those transistors are used for is more important and it’s plausible that improvements between generations can increase performance by multiples on a specific task. All of the improvements aren’t necessarily in the chip itself either.

E.g. the next gen might have hardware inference for lower bits, more memory bandwidth, etc.

You could just give the TLDR: by far the biggest improvement in the different generations of nVidia chips is calculating faster at half the accuracy. For blackwell vs hopper it was "double performance". By which they mean blackwell can calculate with NXFP4 at twice the rate hopper can calculate at FP8. Then go back generations all the way until you arrive at FP64, where we started. They even made a slight detour to "FP128".

Decide for yourself if this is a real improvement. You should probably consider that nVidia did not just give the new chips, but also demonstrated training a neural net with NXFP4.

It's not the only improvement, but it is by far the biggest.

As for the future: nobody's gotten FP2 to work satisfactorily yet. But hey, maybe at nVidia's next conference. But, even NXFP4 is not actually 4 bits (meaning various parts of the computation don't actually happen at 4 bits), and neither was FP8 (you could use it like that but people didn't)

Almost seems as if microchips are approaching their "B-52 age":

"Those things are still flying! Introduced in 1955!"

"But that was the B version, all those that are still flying are the H version, so many iterations between them!"

"Welcome to 1962"

> but an 5x efficiency gain in a single generation feels like its too much, especially considering how newer process nodes have been yielding less and less improvements

The efficiency is in other areas too e.g. memory, network, etc. It's TOTAL.

> Here's a synthethic benchmark page listing every GPU in recent memory

We don't have the GPU gains not because of process nodes. Nvidia and later AMD stopped investing in that direction. They started optimizing for AI not graphics.

they are saying what you are saying. At least Deirde Bosa did. I think there is a lot of folks internally who don't understand the gravity of it and keep questioning it.

You are right about the building of today's DC's. There is a small part of me that feels Oracle might be a bit toxic long term with all this debt him and his kid have taken on. And this could be the first reaction to it.

But this is exactly why Oracle is the wrong partner. They don’t get it. They never will. To them, it’s just a “workload”.
Likely aimed at classified/defence environments. In those spaces, hardware typically takes 18–36 months after commercial deployment before it’s approved—due to firmware vetting, side-channel analysis, crypto validation, and similar processes.

Meanwhile, commercial operators have already deployed their hardware for public workloads. Existing Blackwell capacity won’t just be shifted into classified environments—governments don’t repurpose hardware from unclassified infrastructure for secret/TS systems. That deployed stock will stay in the private sector for hosted AI workloads.

For many high-security use cases, new Blackwell systems may effectively be the only viable option, especially given the slow review cycles around new firmware and GPU software stacks. Newer chipsets will also be prioritized for training due to performance gains.

Oracle likely recognizes this dynamic and is betting competitors may eventually need to deploy in their data centers. Governments haven’t historically deployed GPU capacity at this scale-beyond ASIC/FPGA crypto workloads.. and likely don’t have large pools of pristine Blackwell hardware available.

They’re also purchasing late in the cycle, which may work in their favour.

One could argue that the headline is correct then. Today is tomorrow’s yesterday.
5x improvement of energy efficiency in just GPUs translates to more like 50% reduction of power usage, with is significant but doesn't warrant a 80% reduction in pricing. Especially since Nvidia will charge more for the same card - they have been pricing things pretty aggressively.
And on the DC side they will be building to a power and heat budget. If Vera Rubin changes the power density per rack equation that may have some impact. But thinking rationally if the flops per kw-sq ft are lower than Blackwell, no problem. If they are a lot better then even if the kw per sq ft is higher you can just space the racks out a little
While we have you here, could you please clarify a point in your privacy policy?

> For data collected from the UI or other usage: We retain the personal information described in this privacy notice for as long as you use our Services

I have two quick questions:

1. Why are UI prompts and responses kept for the entire life of the account?

2. When an account is closed, is the data actually deleted or just de-identified?

> Nvidia's "rack scale" machines like GB200-NVL72s and GB300-NVL72s are basically a fully built rack you roll into a DC and plug into power and network. In that case, Oracle should probably just buy the rack-scale Vera Rubins when they come out instead of Blackwells and roll them into their new DCs.

This is what I don't understand. Why is the article making the assumption that the DC itself is tied to a particular GPU generation? AWS doesn't knock down a building and start over every time Intel releases a new Xeon.

Xeons have a much longer shelf life and diverse workloads. If you order hardware specifically for LLM inference and then some new hardware/model combination is much better at that (which it will be, because a lot of people are working on that), you might be in trouble.

It's like setting up a warehouse of GPUs to mine bitcoin while others are switching to ASICs.

Training you mean. Doing inference on last year's chip is probably ok, but training a frontier model on it is going to be a deal breaker.
No I mean inference. The idea is that inference demand will be massive and a race to the bottom with razor thin margins.

Training costs can be amortized over the entire lifetime of the model, but if you lose money on inference or can't offer competitive usage limits for subscribers, there's no amortizing that.

No it's all about having the top model first and training time is what's crucial. OpenAI has already shown willingness to bleed money for the sake of brand and we can expect that to continue.
OpenAI economics don't really work unless you happen to be OpenAI.
Infiniband and coherent fabric.
There are two generations and 4.5 years between A100 and B200.

A100 has 312 TFLOPS of FP16 for 250W, i.e., 1.25 TFLOPS/W.

B200 has 2250 TFLOPS of FP16 compute for 1000W, i.e., 2.25 TFLOPS/W.

This is ~34% growth per generation and ~14% per year. It's hard to believe it will be 400% per generation this time

It might be 400% in the one thing everyone is interested in.
you think in FP16. nobody uses FP16 for inference anymore. 400% probably for FP4/INT4 computation.
Tensor core performance is inversely proportional to precision across all generations (i.e., reducing precision by a factor of 2 increases OPS by a factor of 2). 8-bit precision will give you the same improvement ratio. A100/H100 didn't support 4-bit if I remember correctly.

So FP4/INT4 will likely improve the same 30% OPS/W. You could get a separate improvement by reducing precision, but going 1-bit for 4x improvement feels unlikely for now.

> Oracle should probably just buy the rack-scale Vera Rubins when they come out instead of Blackwells and roll them into their new DCs. Tada! Tomorrow's DCs, tomorrow.

Or we‘ll get a supply problem and they get nothing or not enough. Tomorrow’s DC, never. Tada!

> Or maybe construction of the physical DCs is behind schedule, so today's Blackwells are sitting around unused, waiting for power and networking tomorrow. Then they're in a bit of trouble.

Other reporting says this is very much the case. Stargate barely has some of the land cleared, but the buildings were supposed to be finished and have GPUs installed over the course of 2026.

There's also the indicator of Nvidia giving out billion-dollar deals to other companies such that they could commit to buying even more Blackwells to keep production going. The chips from those new deals don't have anywhere to go, everyone already spent their cash on getting shipped chips that they're still installing today (apparently some are even in warehouses)

Thank you for Synthetic.new

I moved over from OpenRouter and it's been breezy. I hope you are sustainable at $30/month and are successful!

Hey Reiss, I just checked Synthetic. So nice to see indie providers for smaller LLMs. I am personally building products to run only with small (actually < 20b) models. My aim is for laptop usage. Would love to know what plans you have for models smaller than you have currently. Industrial use is all about smaller models IMHO
> The problem appears to be that Oracle is building today's DCs... Tomorrow.

By the time Vera Rubins will be available on scale, will they immediately be put into DCs, or will tomorrows chips be running.. the day after tomorrow?

Isn't this a problem for everyone buying Nvidia GPUs at scale?
I think the difference is that the other hyperscalars are doing this out of the enormous cash rivers produced by their other profitable businesses, at a rate less than that at which profits are flowing in, whereas Oracle is funding it out of debt with AI capex in 2026 projected to reach levels nearly as high as their expected revenue (not profits) in the same period.

If the hardware refresh rate makes a substantial share of data center cost function more like opex than capex, the companies funding it out of operations (especially from operations of what are essentially monopoly businesses, in the sense pricing power), even if it isn’t the operations it power specifically, are fine in the near-to-intemediate term (barring exogenous shocks to those other businesses), whereas Oracle, funding it by a debt bonanza, is in a different position.

Google, Amazon, Meta, etc don't have to wait 12 or 24 months for their big data center to open. They already have lots of DCs to cram all the NVidia cards into, right now.
Definitely not true. Meta is building tents for GPU's
> Meta is building tents for GPU's

And Starlink / xAI is going to shoot them into space. We are simultaneously living in the future and the past.

> And Starlink / xAI is going to shoot them into space.

I highly doubt that. They claim they want to shoot them into space, but I don’t believe a word of it until I see it happen (and see it work). It’s no more real than hyperloop.

DCs in space is hype but actually makes no rational sense when you figure the size of radiators you'll need, and while solar cells are more efficient in space, they aren't that much better.
The Google paper (https://arxiv.org/pdf/2511.19468) didn’t seem too concerned with radiator mass/size when I skimmed it, but maybe I just missed it. My understanding is that if you run the chips relatively hot (and maybe boost with heat pumps? But then you’re not quite as solid state, and maintenance is tough up there), the radiation ability increases enough such that you can make the radiators slightly smaller than the solar panels, and they’d sit on the dark side of the panels. Many people like to point to the ISS system and scale that up, but there’s a big difference between a system assembled in space and meant to keep humans at human temps vs mass manufactured on the ground and keeping things around 100C.
Well, the sun is always up in space, so yeah they're at least 3x better from that alone.
Tents? Like.. where sleep in the woods in?..
I think it's less a matter of space and more a matter of power availability
Only the ones that require profit from the GPUs.
Interested to know more about your inference start up? How you guys operating, do you own hardware or use the cloud?
Next servers might need more power or different cooling. Then your DCs are just big concrete rooms.
All DCs are big concrete rooms that can supply so much power per sq area and remove so much heat per sq area (the two related of course since the heat comes from dissipating the power). Variation is just in density of whatever sort of fancy resistor you plan to put in the concrete room.
Thanks! "thinking resistors" will be the name of my future AI-punk band.
The issue is really about if the DC is water-cooling capable
Supply chain at volume makes it hard to to little else.
I love how you explained this