Hacker News new | ask | show | jobs
by NickHoff 1102 days ago
My question here is about underlying fab capacity. This chip is made on TSMC 4N, along with the H100 and 40xx series consumer GPUs. I assume Nvidia has purchased their entire production capacity. I also assume that Nvidia is using that capacity to produce the products with the highest margins, which probably means the H100 and this new GH200. So when they release this new chip, does it mean effectively fewer H100s and 4090s? Or is that not how fabrication capacity works?

I'm asking because whenever I look at ML training in the cloud, I never see any availability - either for this architecture or the A100s. AWS and GCP have quotas set to 0, lambda labs is usually sold out, paperspace has no capacity, etc. What we need isn't faster or bigger GPUs, it's _more_ GPUs.

10 comments

> This chip is made on TSMC 4N, along with the H100 and 40xx series consumer GPUs. I assume Nvidia has purchased their entire production capacity.

I dont know why you would assume that. Qualcomm has been using TSMC N4 since last year [1]. I'm sure there are other customers as well.

[1] https://www.anandtech.com/show/17395/qualcomm-announces-snap...

It sounds to me like the GH200 achieves more FLOPS per transistor. So, compute demand will be quicker satisfied via the GH200 than via "smaller" chips such as the H100.

Having said that, I don’t think we’re anywhere near some kind of equilibrium for AI compute. If chip supply would magically double tomorrow, then the large companies would buy it for their datacenters and have 100% utilization in a few weeks. They all want to train larger models and scale inference to more users.

In addition to training larger models, I'm sure there are many use cases that AI could serve that are currently cost prohibitive due to the cost of running inference.
I'd like bigger GPUs. A trillion parameter model at 16 bits needs 2000gb+ for inference, more for training. All kinds of things can be done to spread it across multiple GPUs, downsize to less bits etc, but it's a lot easier to just shove a model on one GPU.

We'll likely see more efficiency from bigger GPUs and hopefully more availability as a result.

TBH this is what all ML researcher / engineers have wanted for the past 10 years.
My question on the very slow growth of available memory: are there technical reasons they cannot trivially build a card with 100GB of RAM (even with lower performance) or has it been a business decision to milk the market for every penny?
High speed I/O pins cost a lot, and GDDR generally has 32 data pins per chip and no way to attach multiple chips to the same pins. So 256 bits and 16GB is hard to exceed by much on that tech. The high end is 384 bits and 24GB.

There is a mode to attach 16 data pins to each GDDR chip, so with some extra effort you could probably double that to 48GB. Or at least 32GB. Maybe this is a valid niche, or maybe there isn't enough demand.

The alternative to this is HBM, which can stack up big amounts, but it's a lot more expensive.

I don't disagree with Dylan, but I'm more than willing to bet that the only reason Nvidia's cards (and that's who we're talking about. CUDA is a hell of a moat.) are RAM-starved is that they haven't felt the pressure to do otherwise. AMD has an institutional aversion towards good software. Intel isn't even an also-ran, yet.

Apple and their unified memory architecture may be the prod needed to get larger levels of RAM available to single cards solutions. We'll see.

Nvidia has had unified memory for more than 6 years. This chip is just a faster interconnect for it.
> Or is that not how fabrication capacity works?

Fabs can run multiple complex designs on the same line simultaneously by sharing common tools. For example, photolithography tools can have their reticles swapped out automatically. Obviously, there is a cost to the context switching and most designs cannot be run on the same line as others.

Ultimately, the smallest unit of fabrication capacity is probably best measured along grain of the lot/FOUP (<100 wafers).

>Or is that not how fabrication capacity works?

The basic of Supply Chain and Supply and Demand, as you should have all witness during COVID for toilet rolls are the same.

Fab capacity is not that different to any other manufacturing. You just need to book those capacity way ahead of time. ( 6 - 9 months ) And that is also why I said 99% of news, or rumours about TSMC Capacity are pure BS.

So to answer your question. Yes, Nvidia will likely go for the higher margin products. One of the reason why you see Nvidia working with Samsung and Intel.

It's my understanding from friends in the business that the actual chips do not represent any capacity issue or bottleneck, it's actually manufacturing the devices that the chips are in (e.g. the finished graphics card).
Why would this be the case? I would naively think that since the chips can only be made in a fab and the rest can be made basically anywhere that that wouldn't be true.
They can not be made "anywhere"; when you can't get that PMIC from the original manufacturer, good luck getting it from someone else. And replacing an IC in a QA tested, EMV verified, FCC and CE etc. certified device will often trigger you redoing all that, possibly requiring additional iterations. If there is a similar part available at all.

Take a look at a recent GPU and count the auxiliary components. All of them can cause supply chain difficulties.

That's...fascinating. There's enough space on TSMC but the PCB is the hard part?
For example my corpo hit manufacturing issues (production capacity) with flash memory, with clock oscillators, with auxiliary fpga. But main chips production was fine all the time during chip crisis as far as I know. So yeah, small critical components totally can be a blocker. Some specific voltage controller is unavailable and suddenly your whole design is paralyzed.
pcbs are also full of a bunch of other components, many of which are hard to get ahold of right now.
I think that's it. PCB itself is rather trivial, it's the RAM, but also things like switching regulators (there are others, but then it's a redesign), maybe even stuff like connectors (which don't burn....).

For a science project, we need to manufacture magnets. It's not easy to find a company who has the right iron right now, and it's hard to get, with long lead times. The supply crisis is real.

I see A100 80GB cloud capacity available on both runpod.io and vast.ai currently.
You know I was wondering this the other day when NVDA's insane run up happened. I went down the road of trying to figure out if there was even enough silicon wafers, or if there even would be enough wafers in the next five years, to justify that price.

Unless all the planet does is make silicon wafers; no.

Well you figured wrong - NVDA AI GPUs are a very small % of global foundry supply, if even if volume tripled, they will still be a small % of global foundry supply. NVDA’s revenue is high because their gross margins are extreme, not because their volume is high.
Can you go into more detail? So you're saying that at a 200 P/E ratio NVDA there isn't even enough wafer supply for NVDA to grow into that valuation even over 5 years?
I mean, you've got the gist of it. I pulled some reports on silicon production, silicon waver prices and price trends, current fab capacity etc..

My back of the napkin basically suggested that silicon production would need to 4x and fab capacity 4x (neither of which are happening) and NVDA with would have to capture all of that to justify their current price. I didn't bother writing it up, just looked at it mostly because I was on the wrong side of that play. It's something worth considering for sure.

Wouldn't NVDA just focus more on high margin datacenter products in order to grow into those higher earnings but with the the wafer limitation? Datacenter focused products are already starting to surpass gaming which is their second largest revenue source: http://www.nextplatform.com/wp-content/uploads/2022/05/nvidi...

It seems to me that yes while a 200 P/E may be high, they certainly could keep increasing the prices on the already high margin datacenter products, of which get quickly gobbled up by companies no matter what price they are because of the immense demand.

We're probably ~3 years out from all of those fabs gov'ts funded coming online, right?

(n.b. that's really good work on your end and I agree with your conclusion, just idly musing about the thing that bugs me, what the heck all these non-leading edge fabs are going to do)

I believe availability is low because the GPUs are too expensive so those that need to scale up use the older and much more affordable models.
I'm using Runpod and Datacrunch regulary and they seem to always have some available.