|
|
|
|
|
by pidtuner
2125 days ago
|
|
"The real promise of these methods is to use the universal approximator power of NNs...", still if one is to use a grey-box non-linear model dx/dt = F(x, u, t), why use NNs to characterize F? I would be more comfortable using a polynomial to characterize non-linearity than a "deep" black-box. Polynomials are much easier to "train" because it is just one linear regression with no iteration. It has also been hinted that NN are in essence polynomial regressions [0]. Furthermore, most activation functions are base on e^x where the actual implementation of e^x in a computer is again a polynomial! [0] https://arxiv.org/abs/1806.06850 |
|
I would have thought a computer uses tables to compute e^x. There's also piecewise linear activation functions that are trivially easy to compute gradients of.
The whole "universal approximation" perspective is pretty vague to begin with. I'd say generally people don't understand why NN's work as well as they do. Previously theorists expected they would need a lot more training data to work, given their complexity. So it's driven to a large degree by empirical success. I am certainly really interested to see people accomplishing the same things with less sophisticated methods, since there is no doubt it has been overused/hyped in some areas just to make the papers and proposals sexier.