|
|
|
|
|
by energy123
501 days ago
|
|
The main reason topography emerges in physical brains is because spatially distant connections are physically difficult and expensive in biological systems. Artificial neural nets have no such trade-off. So what's the motivation here? I can understand this might be a very good regularizer, so it could help with generalization error on small-data tasks. But hard to see why this should be on the critical path to AGI. As compute and data grows, you want less inductive bias. For example, CNN will beat ViT on small data tasks, but that flips with enough scale because ViT imposes less inductive bias. Or at least any inductive bias should be chosen because it models the structure of the data well, such as with causal transformers and language. |
|
If the problem of training and inference on neural networks can be optimized so that a topology can be used to keep closely related data together, we will see huge advancements in training and inference speed, and probably in model size as a result.
And speed isn't just speed. Speed makes impossible (not enough time in our lifetime) things possible.
A huge factor in Deepseek being able to train on H800 (half HBM bandwith as H100) is that they used GPU cores to compress/decompress the data moved around between the GPU memory and the compute units. This reduces latency in accessing data and made up for the slower memory bandwith (which translates in higher latency when fetching data). Anything that reduces the latency of memory accesses is a huge accelerator for neural nets. The number one way to achieve this is to keep related data next to each other, so that it fits in the closest caches possible.