| Late here, but a few comments: the main idea of the authors was to combine differential logic gates (an amazing invention I had not heard of) with cellular automata as they say in the paper, or more accurately I would say a grid topology of small neural networks (cells). The cells get and send information to their neighbors. The idea would be you create some sort of outcome for fitness (say an image you want the cells to self organize into, or the rules of Conway’s game of life), set up the training data, and because it’s fully differentiable, Bob’s your uncle at the end. Depending on what you think about computational complexity, this may or may not shock you. But since they’ve been doing gradient descent on differentiable logic gates at the end of the day, when the training is done, they can just turn each cell into binary gates, think AND OR XOR, etc. You then have something that can be used for inference crazy fast. I presume it could also be laid out and sent to a fab, but that work is left for a later paper. :) This architecture could do a LOTTT of things to be clear. But sort of as a warm up they use all the Conway life start and end rules to train cells to implement Conway. Shockingly this can be done in 5 gates(!). I note that they mention almost everywhere that they hand prune unused gates - I imagine this will eventually be automated. They then go on to spec small 7k parameter or so neural networks that when laid out in cells can self organize into different black and white or color images, and can even do so on larger base grids than they were trained, and are resilient to noise being thrown at them. They then demonstrate that async networks (each cell updates randomly) can be trained, and are harder to train but more resilient to noise. All this is quite a lot to take in, and spectacular in my opinion. One thing they mention, a lot, is that a lot of hyperparameter tuning is required for “harder” problems. I can imagine like 50 lines of research out of this paper, but one of them would certainly be adding stability in to the training process. Arc-AGI is mentioned here, and is an awesome idea — could you get a “free lunch” with Arc? Or some of Arc? Different network topologies are yet another interesting question, hidden information, “backing layers” - e.g. why not give each cell 20 private cells that info goes out to and comes back in? Why not make some of those cells talk to some other cells? Why not send radio waves as signals across the custom topology and train an efficient novel analog radio? Why not give each cell access to a shared “super sized” 100k, 1mmk parameter “thinking node”? What would a good topology be for different tasks? I’ll stop here. Amazing paper. Quite a number of PhD papers will be generated out of it, I expect. I’d like to see Minecraft implemented though. Seems possible. Then we could have Bad Apple in Minecraft on raw circuits. |
Either way this research is fantastic. What a result.