| I hate to dog on research papers. They’re work to write. That said, I think this paper is not likely to be of interest to AI researchers — instead it may be of interest to Neuroscience folks or other brain research types. The lede — adding topography worsens networks at similar weights — is not only buried, it’s obscured with statements claiming that topo networks show less upheaval when scaled down, e.g. they are more efficient than similar weight networks. It’s hard for me to see how both these things can be true — the graphs show the more topography is added, the worse the networks perform at the trained model sizes. To have the second statement “They compress better and are therefore more efficient” also be true, I think you’d need to show a pretty remarkable claim, which is that while a model trained at the same scale as a llama architecture is worse, when you scale them both down, this model becomes not only better than the scaled down llama, but also better than a natively trained model at the new smaller scale. There is no proof of this in the paper, and good reason to be skeptical of this idea based on the data presented. That said, like a lot of ideas in AI, this .. works! You can train a model successfully imposing these outside structures on it, and that model doesn’t even suck very much. Which is a cool statement about complexity theory and the resilience of these architectures, in my opinion. But I don’t think it says much else about either the brain or underlying AI ‘truths’. |