|
For interpolation (just to be clear, not regression, i.e. interpolation means the curve has to pass through every point in the data exactly), polynomial interpolation gives a unique polynomial of lowest possible degree [1]; I'm not sure a NN would have fewer parameters than this for interpolation, strictly speaking. To your point, I believe you meant "rough interpolation", and it's true in many cases NN's might produce a less overfitted approximating function if one has no prior knowledge of the generating function. But if one can exploit prior knowledge, one can select an optimal set of basis functions and fit a more parsimonious model than a NN. For instance, if you knew that a nonlinear function was a function of sin, cos and logs, selecting these as basis functions and finding the correct functional form [2] would likely help an optimizer find more parsimonious model than a NN using standard activation functions (ReLU, sigmoid, etc). As a thought experiment, suppose the generating function was this: (5 parameters) y = a1*log(a2*x)/cos(a3*x) + a4*sin(a5*x)
If one attempted to fit this with log, cos and sin basis functions, one is likely recover this form with ~5 parameters. But suppose we tried to fit this with an NN with the stipulation that the approximation error is under some ε -- I suspect we'll need quite a bit more than 5 parameters.NN's tend to generalize better (assuming proper regularization) than polynomial approximations and have fewer numerical problems like Runge's phenomenon, but I don't think NNs aim for (or have results that demonstrate) parsimony in parameters. [1] https://en.wikipedia.org/wiki/Polynomial_interpolation [2] If the functional form is unknown, there are techniques like "symbolic regression" that attempt to do a structure search to find a well-fitting structure. https://en.wikipedia.org/wiki/Symbolic_regression |