| HN Mirror

My opinion is heavily shaped by this interview (5-9 minutes is a good summary): https://www.youtube.com/watch?v=0PAiQ1jTN5k.

We're currently running up against the Von Neumann bottleneck in ML. It's a great architecture for general purpose compute, but not suited for efficient operation of neural networks: a ton of energy is spent shuttling weight and activation values between GPU RAM and ALUs. Biological networks appear to be able to process a large amount of information per unit energy by exploiting the static physical structure of the network. A small amount of energy in sparse neural activations can transform a large volume of information stored in the structure and synapse strengths.

Here's some very rough napkin math: assume the brain has 100 billion neurons, each with 1000 synapses, and is sparsely activated at ~5% activation, and processes at 30Hz (gamma wave frequency, roughly). This means every "tick", 5 billion neurons must interact with 5 trillion other neurons. If you tried to implement this in a Von Neumann architecture, even if you only calculate for the sparsely activated neurons, and even if you quantize your weights values to 1 byte, this means processing ~150 TB/s of weight data. That's an insane amount of memory bandwidth. An A100 is 2 TB/s at 300W, but our brain only uses 20W.

My two cents is that the future will involve a hardware architecture that let's us avoid moving these weights around during inference. Whether this will mean a "mortal computer" as Hinton has recently discussed (https://www.youtube.com/watch?v=sghvwkXV3VU) or whether the weights will be loaded and fixed at init time (as with Mythic's or other neuromorphic approaches), time will tell.