| HN Mirror

Actually it is related.

Transformers are just networks that learn to program the weights of other networks [1]. In the successful cases the programmed network has been quite primitive -- merely a key-value store -- in order to ensure that you can backpropagate errors from the programmed network's outputs all the way to the programmer network's inputs.

The present work extends this idea to a different kind of programmed network: a convolutional image-processing network.

There are many more breakthroughs to be achieved along this line of research -- it is a rich vein to mine. I believe our best shot at getting neural networks to do discrete math and symbolic logic, and to write nontrivial computer programs, will result from this line of research.

[1] https://arxiv.org/abs/2102.11174