|
|
|
|
|
by Lichtso
780 days ago
|
|
1. Interestingly the foundations of this approach and MLP were invented / discovered around the same time about 66 years ago: 1957: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr... 1958: https://en.wikipedia.org/wiki/Multilayer_perceptron 2. Another advantage of this approach is that it has only one class of parameters (the coefficients of the local activation functions) as opposed to MLP which has three classes of parameters (weights, biases, and the globally uniform activation function). 3. Everybody is talking transformers. I want to see diffusion models with this approach. |
|
There isn't much difference between weights of a linear sum and coefficients of a spline.