Hacker News new | ask | show | jobs
by Red_Comet_88 497 days ago
Does anyone know why Nvidia chose to re-use the 4N process for their Blackwell series? From everything I've read, 3N is mature and is already in full production, yet Nvidia chose to just reuse 4N. It seems very much unlike Nvidia to leave performance on the table.
5 comments

N3 is mature and is already in full production for small Mobile SoC only.

The Blackwell goes up to ~750mm2 it is a completely different beast. And Nvidia is already having trouble trying to fill up their Blackwell on a higher capacity, relatively mature N4 Node. Imagine doing it on an expensive N3, and then charge $4999 only to get outrage as rip off on HN and reddit.

Generally speaking the larger die size, high performance chip tends to be a node behind simply because all the leading edge node and tools aren't even designed for them but are specifically aiming at Mobile SoC. Then you add in cost issue and yield.

Surely you don't believe that large chips like Apple's M3 Max and M4 Max shipped while yields were still immature. So do you think that the wafers per month that TSMC is now cranking out over a year after N3B chips started landing in consumers hands still don't qualify as "full production"? How many fabs need to be fully devoted to 3nm before it is enough volume for you to consider it "full production"?
The number of M3 and M4 SKUs suggests they have yield problems and are disabling bad memory and cores.
"Disabling bad memory" as in DRAM isn't a thing that happens, to anybody. DRAM is made in its own fabs and goes through QA before being packaged. So whether it lands onto DIMMs or in a SoC package, it's known-good dies that are being used.

And you cannot look at the number of SKUs without also taking into account how many different die designs are being manufactured and binned to produce that product line. Intel and AMD CPUs have far more bins per die, and usually fewer different die sizes as a starting point. Apple isn't manufacturing a M3 Max and sometimes binning that down to a M3 Pro, or a M3 Pro down to a M3. You're really just seeing about two choices for enabled core count from each die, which is not any kind of red flag.

Memory = on chip cache. M4Max has loads of it....
Disabling cache as a binning strategy isn't too common these days, unless it's a cache slice associated with a CPU or GPU core that's being disabled. Large SRAMs are manufactured usually with some spare cache lines so that they can tolerate a few defects while still operating with the nominal capacity. SRAM defects are usually not the driving force behind a binning decision.

Back when Intel was stagnant at 4 cores for the bulk of their consumer CPU product line, they did stuff like sell i7 parts with 8MB L3 cache and i5 parts with 6MB cache, more as a product segmentation strategy than to improve yields (they once infamously sold a CPU with 3MB last level cache and later offered a software update to increase it to 4MB, meaning all chips of that model had passed binning for 4MB). Nowadays Intel's cache capacities are pretty well correlated with the number of enabled cores. AMD usually doesn't vary L3 cache sizes even between parts with a different number of enabled cores: you get 32MB per 8-core chiplet, whether you have 8 cores or 6 cores enabled.

I don't know to what extent the cache sizes on Apple's chips vary between bins, but it probably follows the pattern of losing only cache that's tied to some other structure that gets disabled.

This. The IP they need, specifically the SerDes, isn't available for 3nm (n3e) yet. That stuff isn't required for consumer devices, it comes later.
I'm not familiar with the process enough to even know what to search here, do you have some recommended reading materials about how IP becomes available for different processes and what the engineering difficulties / process is, especially for different types of IP?
I'm sorry I don't have that. It's partly supply and demand and looking at what the early adopters (Apple, these days) need. And then there's interplay with specs and related roadmaps (i.e. nobody develops their HBM PHY IP until the spec is mature enough). HPC traditionally requires specialist memory IP (e.g. CAM) and more SRAM varieties, so that comes later. Serdes also comes later since it pushes the process limits, both analogue and digital, and also is tied into ethernet standards. You might find public marketing material from Synopsys/Cadence?
N3B is used by Intel for its Lunar Lake, Arrow Lake S and Arrow Lake H CPUs and GPUs, so it is also in full production for high-power non-small and non-mobile chips.

Of course, even if N3B is fine for desktop CPUs and for smaller GPUs, it may still have too low yields for chips of the size of the top models of NVIDIA GPUs.

Anything serious is using n3e.
I think yields is the main reason, although I'm not an expert in that area. If its a standard deviation difference than it will increase their manufacturing costs. Nvidia got around this from Ada--> Blackwell by simply increasing TDP for their high-end cards and adding more cores. Their is no IPC/efficiency/node improvement, if I'm not mistaken, for the first time gen-to-gen.
The answer is probably profit margin. It's also not unusual at all for Nvidia to leave performance on the table in terms of production nodes. I mean just look at the RTX 3000 series which was made with Samsung 8LPH, categorized as a "10nm node". Even at the time, for a late 2020 launch, TSMC already had several generations which were better, in both the "7nm" an "5nm" categories.
US push for local manufacturing might play into that a bit since TSMC's Arizona fab only just started producing 4nm. Nvidia is planning to provide GPUs for major US datacenter expansion, so using locally produced chips should help significantly (especially if the chip tariffs end up becoming a reality).
My guess is it's related to yield and/or large die sizes making them more susceptible to defects. I expect architectural changes in the Blackwell series matter more than performance improvements from process node.
Doesn't Apple pay for exclusive access to N, and N+1 nodes ... shutting everyone else out from using newer node sizes.
That’s my understanding, more or less. It’s not like Apple pays to have capacity sitting idle, they just outbid everyone else to buy (and use) 100% of capacity.
I believe Apple also gives/pays TSMC for the capital needed to even do the R&D to shrink the nodes in the first place.