Hacker News new | ask | show | jobs
by creato 544 days ago
NVIDIA is obviously not above market segmentation via dubious means (see: driver limitations for consumer GPUs), but I think binning due to silicon defects is a more likely explanation in this case.

Some 4090s have this extra fp16 -> fp32 ALU disabled because it's defective on that chip.

Other 4090s have it disabled because it failed as an Ada 6000 for some other reason, but NVIDIA didn't want to make a 4095 SKU to sell it under.

Or if you generalize this for every fusable part of the chip: NVIDIA didn't want to make a 4094.997 SKU that only one person gets to buy (at what price?)

7 comments

Depending on who you ask, binning is segmentation. Generally demand isn't going to exactly match how the yields work out, so companies often take a bunch of perfectly good high-end chips, nerf them, and throw them in the cheapo version. You used to be able to (and still can, in some cases) take a low-end device and, if you'd won the chip lottery, "undo" the binning and have a perfectly functional high-end version. For some chips, almost all the nerfed ones had no defects. But manufacturers like nVidia hated it when customers pulled that trick, so they started making sure it was impossible.
> You used to be able to (and still can, in some cases) take a low-end device and, if you'd won the chip lottery, "undo" the binning and have a perfectly functional high-end version.

For the purposes you tested it, sure. Maybe some esoteric feature you don't use is broken. NVIDIA still can't sell it as the higher end SKU. The tests a chip maker runs to bin their chips are not the same tests you might run.

I'm sure chip makers make small adjustments to supply via binning to satisfy market demand, but if the "technical binning" is too far out of line from the "market binning", that's a lot of money left on the table that will get corrected sooner or later.

edit: And that correction might be in the form of removing redundancies from the chip design, rather than increasing the supply/lowering the price of higher end SKUs. The whole point here is, that's two sides of the same coin.

Disabling cores that have 100% passed QA is quite commonplace, especially for chips that have been on the market for over a year and thus are being built with yields as mature as they're going to get.

Artificially restricting supply of high-end chips and increasing supply of mid-range chips by disabling fully functional cores is how chip makers preserve their pricing structure. Without doing this, market pressures would force prices down on high-end chips and cause lower bins to mostly disappear from the market, leaving the product line with lower overall margins and a PR nightmare every time a new generation launches with pricing reset back to the initially high levels.

As a rule of thumb: if a chip product line goes a whole year without having new SKUs show up with a higher percentage of cores enabled or higher clock speeds for the same core count, then the manufacturer is artificially restricting supply to make more of the lower-bin parts than naturally occur in the fab output.

> And that correction might be in the form of removing redundancies from the chip design, rather than increasing the supply/lowering the price of higher end SKUs.

Those two courses of action take place on completely different timescales. Disabling cores and other binning tricks can be implemented in no more than a few months. Adding a new chip with a different number of copies of the same IP blocks takes well over a year. Removing redundancy within an IP block (eg. by having fewer spare SRAM blocks for a cache of a fixed capacity) isn't going to happen within a single chip generation.

In the semiconductor world, corrections of any kind tend toward "later" rather than "sooner".

Dumb question (maybe): I'm aware that "tester time" is very expensive for advanced integrated circuits. Could it be that disabled cores are actually "unknown" i.e. probably good, but money was saved by not even testing them?
It's more likely that any defect in a core causes the whole core to be disabled. Especially in the this case where I assume the FP16 x FP16 -> FP32 path uses the same hardware as the FP16 x FP16 -> FP16 path.
Exactly. They can easily sell more Ada 6000s, and I'm pretty sure they would do so rather than sell them for much less as 4090s.
I think this is just like intel does.

Runs fast? i9. slower? i7. missing cores? i5 slowest? i3

perfect chips probably not only have all the cores working, they also run at low voltages so don't get as hot.

I wonder if they can figure out what parts of the chip run at what speeds, and disable the ones that run slow/hot

I'm pretty sure gpus are overclocked by vendors, so there must be some sort of binning either by the vendors or they buy binned parts. I'll bet if parts could go faster, you would have an ASUS/MSI/etc 4090-2x-max-$$$$

https://www.tomshardware.com/reviews/glossary-binning-defini...

I recall reading before that as yields improved over process maturation Intel has ended up binning faster passing chips as lower SKUs just to meet demand.
I'm not sure that it's completed as a separate fp16 ALU. There's cute ways to share logic between a dual fp16 alu and a single fp32 ALU such that it's really just one ALU with those being different ops.
As I understand it, that's how the original MMX got started. It was largely reusing the x87 ALU, but breaking the carry chains at the obvious points.
There must be a whole layer of reroute if defective plumbing in drivers.
I don't even understand why binning non-defect cards is dubious.

It's like the logic people have on /r/pcmasterrace is that if they didn't bin, they would just release all 4090s at 4080 prices. No, there would just be less 4080s for people to buy. No chip maker is going to sell their chips at sub-market rates just because they engineered them to/have a fab that can produce them at very low defect rates.

Now, Nvidia certainly has done dubious things. They've hurt their partners (EVGA, you're missed), and skyrocketing the baseline GPU prices is scummy as hell. But binning isn't anything I necessarily consider dubious.

> No, there would just be less 4080s for people to buy.

Sure, and more 4090s at a lower price.

Nvidia wouldn't leave money on the table dropping the price of the 4090. There would just be more supply of 4090s at the same price. A card manufacturer selling them below MSRP would get immediately sanctioned by Nvidia.
Even a monopolist is constrained by supply and demand. If they could sell everything they make as a 4090 without trashing their margins, they would. The fact that 4080, 4070, 4060 lines exist means they can't.
Right. And the fact that a BMW 2-series exists means that BMW can't sell all the 7-series' that they want.

They're cutting down 4090s into 4080s to fulfill a demand for a cheaper chip, while still supplying their premium option. Your fanciful world concept of things being sold for no/minimum margins is just that: fanciful.

This starts as binning and ends up at down binning :)