Hacker News new | ask | show | jobs
by suref 2011 days ago
There's no rule that when the number of parameters is small deep learning shouldn't be used. The one time where deep learning maybe shouldn't be attempted at all is when the number of samples is very limited. While it excels with high dimensional hierarchical data it can do well on other problems as well. It differentiates from problem to problem and usually multiple solutions are tried and compared, starting with EDA and linear regression.
1 comments

> There's no rule that when the number of parameters is small deep learning shouldn't be used

I would be genuinely interested in examples of problems with a very low number of predictors (say two to five) when a neutral net would be appropriate (where as you say less complex methods have been tried and failed).

I just can't think of one.

Suppose you have to fit a fairly non-linear curve to make interpolated predictions. A NN could do that with fewer parameters than most other models.

I can't think of a method that would use fewer parameters. If nothing else, it's a decent way to compress the data set for interpolation (on nearby averages) as a use case, no?

For interpolation (just to be clear, not regression, i.e. interpolation means the curve has to pass through every point in the data exactly), polynomial interpolation gives a unique polynomial of lowest possible degree [1]; I'm not sure a NN would have fewer parameters than this for interpolation, strictly speaking.

To your point, I believe you meant "rough interpolation", and it's true in many cases NN's might produce a less overfitted approximating function if one has no prior knowledge of the generating function.

But if one can exploit prior knowledge, one can select an optimal set of basis functions and fit a more parsimonious model than a NN. For instance, if you knew that a nonlinear function was a function of sin, cos and logs, selecting these as basis functions and finding the correct functional form [2] would likely help an optimizer find more parsimonious model than a NN using standard activation functions (ReLU, sigmoid, etc). As a thought experiment, suppose the generating function was this: (5 parameters)

  y = a1*log(a2*x)/cos(a3*x) + a4*sin(a5*x)
If one attempted to fit this with log, cos and sin basis functions, one is likely recover this form with ~5 parameters. But suppose we tried to fit this with an NN with the stipulation that the approximation error is under some ε -- I suspect we'll need quite a bit more than 5 parameters.

NN's tend to generalize better (assuming proper regularization) than polynomial approximations and have fewer numerical problems like Runge's phenomenon, but I don't think NNs aim for (or have results that demonstrate) parsimony in parameters.

[1] https://en.wikipedia.org/wiki/Polynomial_interpolation

[2] If the functional form is unknown, there are techniques like "symbolic regression" that attempt to do a structure search to find a well-fitting structure. https://en.wikipedia.org/wiki/Symbolic_regression

For online learning, multi-output, non-negative output, unlabeled data etc neural networks works well. The power of deep learning lies in how you can shape the problem and loss function for specific purposes. And even if these circumstances do not exit they can do well, it's all problem specific.
Historically the XOR function has been the simple example that many ML algorithms can't handle. Just imagine a higher dimensional XOR with outliers, and you have a pretty good use case for DL with limited predictors.
Historically, this was solved in the 80's with the multi-layer perceptron, but it seems it still gets repeated.