|
|
|
|
|
by ryao
384 days ago
|
|
I did the math last year to estimate how many wafers per year Nvidia had, and from my recollection it was >50,000. Cerebras with their ~300 per year is not able to handle the inference needs of the market. It does not help that all of their memory must be inside the wafer, which limits the amount of die area they have for actual logic. They have no prospect for growth unless TSMC decides to bless them or they switch to another foundation. > While you may need several WSE-3s to load the model, if you have enough demand that you are running the WSE-3 at full speed you will not be using more area in the WSE-3. You need ~20 wafers to run the Llama 4 Behemoth model on Cerebras hardware. This is close to a million mm^2. The Nvidia hardware that they used in their comparison should have less than 10,000 mm^2 die area, yet can run it fine thanks to the external DRAM. How is the CSE-3 not using more die area? > In fact, the WSE-3 may be more efficient, since it won't be loading and unloading things from large memories. This makes no sense to me. Inference software loads the model once and then uses it multiple times. This should be the same for both Nvidia and Cerebras. |
|
Of course these guys depend on getting chips, but so does everybody. I don't know how difficult it is, but all sorts of entities get TSMC 5nm. Maybe they'll get TSMC 3nm and 2nm later than NVIDIA, but it's also possible that they don't.