|
|
|
|
|
by justanotherjoe
848 days ago
|
|
fuck. I have an idea just like this one. I guess it's true that ideas are a dime a dozen.
Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model. Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It's doing so many things so right. Although this is not really related to the topic. |
|
Transformers are just networks that learn to program the weights of other networks [1]. In the successful cases the programmed network has been quite primitive -- merely a key-value store -- in order to ensure that you can backpropagate errors from the programmed network's outputs all the way to the programmer network's inputs.
The present work extends this idea to a different kind of programmed network: a convolutional image-processing network.
There are many more breakthroughs to be achieved along this line of research -- it is a rich vein to mine. I believe our best shot at getting neural networks to do discrete math and symbolic logic, and to write nontrivial computer programs, will result from this line of research.
[1] https://arxiv.org/abs/2102.11174