Hacker News new | ask | show | jobs
by ksec 497 days ago
N3 is mature and is already in full production for small Mobile SoC only.

The Blackwell goes up to ~750mm2 it is a completely different beast. And Nvidia is already having trouble trying to fill up their Blackwell on a higher capacity, relatively mature N4 Node. Imagine doing it on an expensive N3, and then charge $4999 only to get outrage as rip off on HN and reddit.

Generally speaking the larger die size, high performance chip tends to be a node behind simply because all the leading edge node and tools aren't even designed for them but are specifically aiming at Mobile SoC. Then you add in cost issue and yield.

4 comments

Surely you don't believe that large chips like Apple's M3 Max and M4 Max shipped while yields were still immature. So do you think that the wafers per month that TSMC is now cranking out over a year after N3B chips started landing in consumers hands still don't qualify as "full production"? How many fabs need to be fully devoted to 3nm before it is enough volume for you to consider it "full production"?
The number of M3 and M4 SKUs suggests they have yield problems and are disabling bad memory and cores.
"Disabling bad memory" as in DRAM isn't a thing that happens, to anybody. DRAM is made in its own fabs and goes through QA before being packaged. So whether it lands onto DIMMs or in a SoC package, it's known-good dies that are being used.

And you cannot look at the number of SKUs without also taking into account how many different die designs are being manufactured and binned to produce that product line. Intel and AMD CPUs have far more bins per die, and usually fewer different die sizes as a starting point. Apple isn't manufacturing a M3 Max and sometimes binning that down to a M3 Pro, or a M3 Pro down to a M3. You're really just seeing about two choices for enabled core count from each die, which is not any kind of red flag.

Memory = on chip cache. M4Max has loads of it....
Disabling cache as a binning strategy isn't too common these days, unless it's a cache slice associated with a CPU or GPU core that's being disabled. Large SRAMs are manufactured usually with some spare cache lines so that they can tolerate a few defects while still operating with the nominal capacity. SRAM defects are usually not the driving force behind a binning decision.

Back when Intel was stagnant at 4 cores for the bulk of their consumer CPU product line, they did stuff like sell i7 parts with 8MB L3 cache and i5 parts with 6MB cache, more as a product segmentation strategy than to improve yields (they once infamously sold a CPU with 3MB last level cache and later offered a software update to increase it to 4MB, meaning all chips of that model had passed binning for 4MB). Nowadays Intel's cache capacities are pretty well correlated with the number of enabled cores. AMD usually doesn't vary L3 cache sizes even between parts with a different number of enabled cores: you get 32MB per 8-core chiplet, whether you have 8 cores or 6 cores enabled.

I don't know to what extent the cache sizes on Apple's chips vary between bins, but it probably follows the pattern of losing only cache that's tied to some other structure that gets disabled.

Yes...cores (CPU and GPU) have large caches. If the cache (or associated slice) is busted, so is the core.
This. The IP they need, specifically the SerDes, isn't available for 3nm (n3e) yet. That stuff isn't required for consumer devices, it comes later.
I'm not familiar with the process enough to even know what to search here, do you have some recommended reading materials about how IP becomes available for different processes and what the engineering difficulties / process is, especially for different types of IP?
I'm sorry I don't have that. It's partly supply and demand and looking at what the early adopters (Apple, these days) need. And then there's interplay with specs and related roadmaps (i.e. nobody develops their HBM PHY IP until the spec is mature enough). HPC traditionally requires specialist memory IP (e.g. CAM) and more SRAM varieties, so that comes later. Serdes also comes later since it pushes the process limits, both analogue and digital, and also is tied into ethernet standards. You might find public marketing material from Synopsys/Cadence?
N3B is used by Intel for its Lunar Lake, Arrow Lake S and Arrow Lake H CPUs and GPUs, so it is also in full production for high-power non-small and non-mobile chips.

Of course, even if N3B is fine for desktop CPUs and for smaller GPUs, it may still have too low yields for chips of the size of the top models of NVIDIA GPUs.

Anything serious is using n3e.
I think yields is the main reason, although I'm not an expert in that area. If its a standard deviation difference than it will increase their manufacturing costs. Nvidia got around this from Ada--> Blackwell by simply increasing TDP for their high-end cards and adding more cores. Their is no IPC/efficiency/node improvement, if I'm not mistaken, for the first time gen-to-gen.