We're currently running up against the Von Neumann bottleneck in ML. It's a great architecture for general purpose compute, but not suited for efficient operation of neural networks: a ton of energy is spent shuttling weight and activation values between GPU RAM and ALUs. Biological networks appear to be able to process a large amount of information per unit energy by exploiting the static physical structure of the network. A small amount of energy in sparse neural activations can transform a large volume of information stored in the structure and synapse strengths.
Here's some very rough napkin math: assume the brain has 100 billion neurons, each with 1000 synapses, and is sparsely activated at ~5% activation, and processes at 30Hz (gamma wave frequency, roughly). This means every "tick", 5 billion neurons must interact with 5 trillion other neurons. If you tried to implement this in a Von Neumann architecture, even if you only calculate for the sparsely activated neurons, and even if you quantize your weights values to 1 byte, this means processing ~150 TB/s of weight data. That's an insane amount of memory bandwidth. An A100 is 2 TB/s at 300W, but our brain only uses 20W.
My two cents is that the future will involve a hardware architecture that let's us avoid moving these weights around during inference. Whether this will mean a "mortal computer" as Hinton has recently discussed (https://www.youtube.com/watch?v=sghvwkXV3VU) or whether the weights will be loaded and fixed at init time (as with Mythic's or other neuromorphic approaches), time will tell.
We're currently running up against the Von Neumann bottleneck in ML. It's a great architecture for general purpose compute, but not suited for efficient operation of neural networks: a ton of energy is spent shuttling weight and activation values between GPU RAM and ALUs. Biological networks appear to be able to process a large amount of information per unit energy by exploiting the static physical structure of the network. A small amount of energy in sparse neural activations can transform a large volume of information stored in the structure and synapse strengths.
Here's some very rough napkin math: assume the brain has 100 billion neurons, each with 1000 synapses, and is sparsely activated at ~5% activation, and processes at 30Hz (gamma wave frequency, roughly). This means every "tick", 5 billion neurons must interact with 5 trillion other neurons. If you tried to implement this in a Von Neumann architecture, even if you only calculate for the sparsely activated neurons, and even if you quantize your weights values to 1 byte, this means processing ~150 TB/s of weight data. That's an insane amount of memory bandwidth. An A100 is 2 TB/s at 300W, but our brain only uses 20W.
My two cents is that the future will involve a hardware architecture that let's us avoid moving these weights around during inference. Whether this will mean a "mortal computer" as Hinton has recently discussed (https://www.youtube.com/watch?v=sghvwkXV3VU) or whether the weights will be loaded and fixed at init time (as with Mythic's or other neuromorphic approaches), time will tell.