Hacker News new | ask | show | jobs
by cubefox 106 days ago
ASICs only work for very small and heavily quantized models. Moreover, they are fixed function hardware, so whenever you have a new model, you have to throw the current chips away and design and buy new ones. That's like buying a new CPU every time a new OS version comes out.
3 comments

The latest strategies of etching weights into silicon seem like they can be generalized. We currently design gpu/tpu caching on the basis that the weights change frequently - if the weights do not change at all, or change very slowly - then there are other perhaps more efficient ways of laying out the memory on the chip which are somewhere between permanently etch a model onto silicon and use GPUs designed for graphics computation.
I'm assuming that they will do a silicon etching run once a year. Might be an interesting acquisition opportunity for Apple since that's the rhythm of their device release.
It's a good point, it would be a nice "upgrade story" to get the next generation model. At a fixed cost of ~$1000 per model, it wouldn't be a bad deal relative to current api costs.
That would be something like an FPGA. Which have been very unpopular so far due to high cost. And they also only support a relatively small number of weights.
That depends what kind of ASIC you’re talking about. Cerebras can run models like GLM 4.7 with 355B parameters.
Cerebras just uses SRAM instead of DRAM. An ASIC would instead hardwire the neural network.
Surely it's more of a spectrum? From a CPU, to a TPU, to a chip that hardwires softmax attention but lets you store arbitrary weights, to one that hardwires the weights directly.
Google’s training and running all their stuff on ASICs, seems to be working out well.
they're TPUs, same thing as GPUs but specifically for tensor ops.
TPUs are not ASICs if they can execute arbitrary models.
Forgive my ignorance but wouldn't a TPU be a kind of ASIC where the application is model inference? The TPU Wikipedia article also says it's an ASIC - we should update it if it's wrong.
In the limit, even a CPU could be called an ASIC because certain algorithmic operations (ALU etc) are implemented in hardware. CPU/ASIC are really poles of a gradient, with a CPU implementing very little in hardware and most in software, while an ASIC has very little software and lots of hardware. A TPU is presumably in between. I would argue however that it is closer to a GPU than to a full-blown ASIC, because the weights are stored in memory only, making them software.