| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kelipso 109 days ago
	With the ASICs being so efficient at inference, I’m getting the feeling even the infrastructure investments are going to be severely outdated in a couple of years..

5 comments

Hamuko 109 days ago

Jensen Huang put it quite decently: "When Blackwell starts shipping in volume, you couldn't give Hoppers away."

All of the current GPU investments are gonna hit zero, and probably a lot faster than the companies buying them realise. Definitely a lot faster than the investors realise.

link

Ardren 109 days ago

When did Blackwell start shipping in volume? A H200 is still > $30,000.

I'd settle for some free 80GB A100 cards! ($7,000 2nd hand on ebay right now)

link

cubefox 109 days ago

ASICs only work for very small and heavily quantized models. Moreover, they are fixed function hardware, so whenever you have a new model, you have to throw the current chips away and design and buy new ones. That's like buying a new CPU every time a new OS version comes out.

link

lumost 109 days ago

The latest strategies of etching weights into silicon seem like they can be generalized. We currently design gpu/tpu caching on the basis that the weights change frequently - if the weights do not change at all, or change very slowly - then there are other perhaps more efficient ways of laying out the memory on the chip which are somewhere between permanently etch a model onto silicon and use GPUs designed for graphics computation.

link

intrasight 109 days ago

I'm assuming that they will do a silicon etching run once a year. Might be an interesting acquisition opportunity for Apple since that's the rhythm of their device release.

link

lumost 109 days ago

It's a good point, it would be a nice "upgrade story" to get the next generation model. At a fixed cost of ~$1000 per model, it wouldn't be a bad deal relative to current api costs.

link

cubefox 109 days ago

That would be something like an FPGA. Which have been very unpopular so far due to high cost. And they also only support a relatively small number of weights.

link

joefourier 109 days ago

That depends what kind of ASIC you’re talking about. Cerebras can run models like GLM 4.7 with 355B parameters.

link

cubefox 108 days ago

Cerebras just uses SRAM instead of DRAM. An ASIC would instead hardwire the neural network.

link

joefourier 108 days ago

Surely it's more of a spectrum? From a CPU, to a TPU, to a chip that hardwires softmax attention but lets you store arbitrary weights, to one that hardwires the weights directly.

link

surfmike 109 days ago

Google’s training and running all their stuff on ASICs, seems to be working out well.

link

r_lee 109 days ago

they're TPUs, same thing as GPUs but specifically for tensor ops.

link

cubefox 109 days ago

TPUs are not ASICs if they can execute arbitrary models.

link

AdamN 109 days ago

Forgive my ignorance but wouldn't a TPU be a kind of ASIC where the application is model inference? The TPU Wikipedia article also says it's an ASIC - we should update it if it's wrong.

link

cubefox 108 days ago

In the limit, even a CPU could be called an ASIC because certain algorithmic operations (ALU etc) are implemented in hardware. CPU/ASIC are really poles of a gradient, with a CPU implementing very little in hardware and most in software, while an ASIC has very little software and lots of hardware. A TPU is presumably in between. I would argue however that it is closer to a GPU than to a full-blown ASIC, because the weights are stored in memory only, making them software.

link

alephnerd 109 days ago

> investments are going to be severely outdated in a couple of years

Compute in DCs already have an accounting lifespan of 3 years. The current trend of investments is a mix of expansion and well as upgrades on existing capacity.

This is why hyperscalers like Amazon, Microsoft, and GCP invested in inference ASICs a couple years ago, so they could migrate a larger mixture of their compute to these and offer services at better margins.

link

RaftPeople 109 days ago

I've been thinking the same thing although not specifically about ASICs.

I was thinking any breakthrough in hardware (e.g. spintronics etc.), even if just partially effective, means all of this hardware would need to be replaced.

link

raw_anon_1111 109 days ago

GPUs mean lifetime is around 3 years at scale. They are going to need to be replaced anyway

link

AndroTux 109 days ago

That’s okay by me. I’m ready to buy one or two on the cheap

link

raw_anon_1111 109 days ago

By mean lifetime I mean “failure”. They won’t be any good

link

jazzyjackson 109 days ago

They don’t fail after 3 years, just a poor use of electricity once the next generation of silicon hits. It’s not economical to keep the old hardware running when it’s taking up rack space.

link

ronsor 109 days ago

I assure you datacenter GPUs like B200 do fail regularly (within months in many cases), so much so that it's a problem for labs doing large training runs.

link

cmxch 108 days ago

As long as they can be made to work in a consumer or homelab setting, they are still useful.

link

raw_anon_1111 107 days ago

By “fail” I mean stop working

link