Hacker News new | ask | show | jobs
by kelipso 109 days ago
With the ASICs being so efficient at inference, I’m getting the feeling even the infrastructure investments are going to be severely outdated in a couple of years..
5 comments

Jensen Huang put it quite decently: "When Blackwell starts shipping in volume, you couldn't give Hoppers away."

All of the current GPU investments are gonna hit zero, and probably a lot faster than the companies buying them realise. Definitely a lot faster than the investors realise.

When did Blackwell start shipping in volume? A H200 is still > $30,000.

I'd settle for some free 80GB A100 cards! ($7,000 2nd hand on ebay right now)

ASICs only work for very small and heavily quantized models. Moreover, they are fixed function hardware, so whenever you have a new model, you have to throw the current chips away and design and buy new ones. That's like buying a new CPU every time a new OS version comes out.
The latest strategies of etching weights into silicon seem like they can be generalized. We currently design gpu/tpu caching on the basis that the weights change frequently - if the weights do not change at all, or change very slowly - then there are other perhaps more efficient ways of laying out the memory on the chip which are somewhere between permanently etch a model onto silicon and use GPUs designed for graphics computation.
I'm assuming that they will do a silicon etching run once a year. Might be an interesting acquisition opportunity for Apple since that's the rhythm of their device release.
It's a good point, it would be a nice "upgrade story" to get the next generation model. At a fixed cost of ~$1000 per model, it wouldn't be a bad deal relative to current api costs.
That would be something like an FPGA. Which have been very unpopular so far due to high cost. And they also only support a relatively small number of weights.
That depends what kind of ASIC you’re talking about. Cerebras can run models like GLM 4.7 with 355B parameters.
Cerebras just uses SRAM instead of DRAM. An ASIC would instead hardwire the neural network.
Surely it's more of a spectrum? From a CPU, to a TPU, to a chip that hardwires softmax attention but lets you store arbitrary weights, to one that hardwires the weights directly.
Google’s training and running all their stuff on ASICs, seems to be working out well.
they're TPUs, same thing as GPUs but specifically for tensor ops.
TPUs are not ASICs if they can execute arbitrary models.
Forgive my ignorance but wouldn't a TPU be a kind of ASIC where the application is model inference? The TPU Wikipedia article also says it's an ASIC - we should update it if it's wrong.
In the limit, even a CPU could be called an ASIC because certain algorithmic operations (ALU etc) are implemented in hardware. CPU/ASIC are really poles of a gradient, with a CPU implementing very little in hardware and most in software, while an ASIC has very little software and lots of hardware. A TPU is presumably in between. I would argue however that it is closer to a GPU than to a full-blown ASIC, because the weights are stored in memory only, making them software.
> investments are going to be severely outdated in a couple of years

Compute in DCs already have an accounting lifespan of 3 years. The current trend of investments is a mix of expansion and well as upgrades on existing capacity.

This is why hyperscalers like Amazon, Microsoft, and GCP invested in inference ASICs a couple years ago, so they could migrate a larger mixture of their compute to these and offer services at better margins.

I've been thinking the same thing although not specifically about ASICs.

I was thinking any breakthrough in hardware (e.g. spintronics etc.), even if just partially effective, means all of this hardware would need to be replaced.

GPUs mean lifetime is around 3 years at scale. They are going to need to be replaced anyway
That’s okay by me. I’m ready to buy one or two on the cheap
By mean lifetime I mean “failure”. They won’t be any good
They don’t fail after 3 years, just a poor use of electricity once the next generation of silicon hits. It’s not economical to keep the old hardware running when it’s taking up rack space.
I assure you datacenter GPUs like B200 do fail regularly (within months in many cases), so much so that it's a problem for labs doing large training runs.
As long as they can be made to work in a consumer or homelab setting, they are still useful.
By “fail” I mean stop working