| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AYBABTME 499 days ago

Locality of data and computation is very important in neural nets. It's the number one reason why training and inference are as slow as they are. It's why GPUs need super expensive HBM memory, why NVLink is a thing, why Infiniband is a thing.

If the problem of training and inference on neural networks can be optimized so that a topology can be used to keep closely related data together, we will see huge advancements in training and inference speed, and probably in model size as a result.

And speed isn't just speed. Speed makes impossible (not enough time in our lifetime) things possible.

A huge factor in Deepseek being able to train on H800 (half HBM bandwith as H100) is that they used GPU cores to compress/decompress the data moved around between the GPU memory and the compute units. This reduces latency in accessing data and made up for the slower memory bandwith (which translates in higher latency when fetching data). Anything that reduces the latency of memory accesses is a huge accelerator for neural nets. The number one way to achieve this is to keep related data next to each other, so that it fits in the closest caches possible.

2 comments

mirekrusin 499 days ago

It's true, but isn't OP also correct? Ie. it's about speed, which implies locality, which implies approaches like MoE which does exactly that and it's unlike physical brain topology?

Having said that it would be fun to see things like rearrangement data moves based on temerature of silicon parts after training cycle.

link

nickpsecurity 499 days ago

Well, locality and the global nature of pre-training methods. The brain mostly uses local learning (Hebbian learning) which requires less, data movement. AI firms putting as much money into making that scale as they did on backpropagation might drop costs a lot.

link