Hacker News new | ask | show | jobs
by bno1 1607 days ago
I wonder if things like this are the real reason behind the GPU shortage. How many other AI super computers are being built right now?
5 comments

This is definitely not the case. A100, which is used for most "AI supercomputers" is manufactured on TSMC fabs, while Nvidia's gaming cards are produced on Samsung fabs. AMD produces their gaming GPUs on TSMC, but they are somewhere around 10% of the market since they are unwilling to divert their capacity from CPUs, which are more profitable, and consoles, really not sure why.
I think these sorts of computers use special GPUs that are industrial and used specifically for AI/ML work. I don't believe they've powered the super computer with 3080s and I also don't think they use the same underlying chips either (albeit they are probably built with the same raw material that might be in short supply).
They take up chip fab capacity and that’s the bottleneck. The fact that it would be a custom die doesn’t really make a difference (and high level the features that go on the chip aren’t really all that different either, same stuff with various quantities and features tweaked)
They take up chip capacity on different fab. You can't produce gaming Ampere on TSMC. They are built on different architectures that have only name in common. The difference between "Ampere" and "Ampere" is bigger than between Volta and Turing, or maybe even than difference between Pascal and Turing.
Good luck building special GPUs in just two years, especially with what's happening regarding chip production right now. Not sure how they could have achieved a project of this size/scope unless they use off-the-shelf components, since the backlogs are so long and have been for some time now.
Facebook has been hiring FPGA engineers with ML experience since 2018 so I don't think this would be out of the question! But even so, Nvidia sell custom GPUs that aren't the same ones for gaming.
They claim the work was done in just two years ("The supercomputer, the AI Research SuperCluster, was the result of nearly two years of work"). I'm not a expert in manufacturing, but I'd imagine it takes longer than two years to design > test > manufacture completely new chips.

> But even so, Nvidia sell custom GPUs that aren't the same ones for gaming.

Interesting. Gonna be fun to observe the outrage (from distance) about how GPUs are not only used to destroy the environment for cryptocurrency profits, but now Facebook will also contribute to the world destruction for ad-money.

Yes, lots of GPUs have been purchased to be installed in compute clusters since ~2009. The deep learning boom only increased that. At least these are not being used to mine cryptocoins...
You can probably look up who is TSMC manufacturing custom chips for since they’re public
naa, they have at most ~30k GPUs. most of it is lack of capacity rather that large demand