| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _hark 435 days ago
	Can anyone comment on where efficiency gains come from these days at the arch level? I.e. not process-node improvements. Are there a few big things, many small things...? I'm curious what fruit are left hanging for fast SIMD matrix multiplication.

3 comments

vessenes 435 days ago

One big area the last two years has been algorithmic improvements feeding hardware improvements. Supercomputer folks use f64 for everything, or did. Most training was done at f32 four years ago. As algo teams have shown fp8 can be used for training and inference, hardware has updated to accommodate, yielding big gains.

NB: Hobbyist, take all with a grain of salt

link

jmalicki 435 days ago

Unlike a lot of supercomputer algorithms, where fp error accumulates as you go, gradient descent based algorithms don't need as much precision since any fp errors will still show up at the next loss function calculation to be corrected, which allows you to make do with much lower precision.

link

cubefox 434 days ago

Much lower indeed. Even Boolean functions (e.g. AND) are differentiable (though not exactly in the Newton/Leibniz sense) which can be used for backpropagation. They allow for an optimizer similar to stochastic gradient descent. There is a paper on it: https://arxiv.org/abs/2405.16339

It seems to me that floating point math (matrix multiplication) will over time mostly disappear from ML chips, as Boolean operations are much faster both in training an inference. But currently they are still optimized for FP rather than Boolean operations.

link

muxamilian 435 days ago

In-memory computing (analog or digital). Still doing SIMD matrix multiplication but using more efficient hardware: https://arxiv.org/html/2401.14428v1 https://www.nature.com/articles/s41565-020-0655-z

link

gautamcgoel 435 days ago

This is very interesting, but not what the Ironside TPU is doing. The blog post says that the TPU uses conventional HBM RAM.

link

nsteel 435 days ago

There's been some talk/rumour of next-gen HBMs having some compute capability on the base die. But again, not what they're doing here, this is regular HBM3/HBM3e.

https://semiengineering.com/speeding-down-memory-lane-with-c...

link

yeahwhatever10 435 days ago

Specialization. Ie specialized for inference.

link