With the ASICs being so efficient at inference, I’m getting the feeling even the infrastructure investments are going to be severely outdated in a couple of years..
Jensen Huang put it quite decently: "When Blackwell starts shipping in volume, you couldn't give Hoppers away."
All of the current GPU investments are gonna hit zero, and probably a lot faster than the companies buying them realise. Definitely a lot faster than the investors realise.
ASICs only work for very small and heavily quantized models. Moreover, they are fixed function hardware, so whenever you have a new model, you have to throw the current chips away and design and buy new ones. That's like buying a new CPU every time a new OS version comes out.
The latest strategies of etching weights into silicon seem like they can be generalized. We currently design gpu/tpu caching on the basis that the weights change frequently - if the weights do not change at all, or change very slowly - then there are other perhaps more efficient ways of laying out the memory on the chip which are somewhere between permanently etch a model onto silicon and use GPUs designed for graphics computation.
I'm assuming that they will do a silicon etching run once a year. Might be an interesting acquisition opportunity for Apple since that's the rhythm of their device release.
It's a good point, it would be a nice "upgrade story" to get the next generation model. At a fixed cost of ~$1000 per model, it wouldn't be a bad deal relative to current api costs.
That would be something like an FPGA. Which have been very unpopular so far due to high cost. And they also only support a relatively small number of weights.
Surely it's more of a spectrum? From a CPU, to a TPU, to a chip that hardwires softmax attention but lets you store arbitrary weights, to one that hardwires the weights directly.
Forgive my ignorance but wouldn't a TPU be a kind of ASIC where the application is model inference? The TPU Wikipedia article also says it's an ASIC - we should update it if it's wrong.
In the limit, even a CPU could be called an ASIC because certain algorithmic operations (ALU etc) are implemented in hardware. CPU/ASIC are really poles of a gradient, with a CPU implementing very little in hardware and most in software, while an ASIC has very little software and lots of hardware. A TPU is presumably in between. I would argue however that it is closer to a GPU than to a full-blown ASIC, because the weights are stored in memory only, making them software.
> investments are going to be severely outdated in a couple of years
Compute in DCs already have an accounting lifespan of 3 years. The current trend of investments is a mix of expansion and well as upgrades on existing capacity.
This is why hyperscalers like Amazon, Microsoft, and GCP invested in inference ASICs a couple years ago, so they could migrate a larger mixture of their compute to these and offer services at better margins.
I've been thinking the same thing although not specifically about ASICs.
I was thinking any breakthrough in hardware (e.g. spintronics etc.), even if just partially effective, means all of this hardware would need to be replaced.
They don’t fail after 3 years, just a poor use of electricity once the next generation of silicon hits. It’s not economical to keep the old hardware running when it’s taking up rack space.
I assure you datacenter GPUs like B200 do fail regularly (within months in many cases), so much so that it's a problem for labs doing large training runs.
All of the current GPU investments are gonna hit zero, and probably a lot faster than the companies buying them realise. Definitely a lot faster than the investors realise.