| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chriszhang 2057 days ago
	We trained a deep learning model to look at like 20 system parameters and predict an output. the parameters were binary. So one curios engineer decided to brute-force the trained model with all possible inputs like 2^20 inputs to see what the model does. he found for the problem we were solving only 4 of the 20 parameters had effect on results. the remaining approx 16 parameters do not affect results. So he replaced the model with a single line of code with one boolean expression made with those 4 parameters connected with logical operators.

6 comments

teruakohatu 2057 days ago

That kind of problem, with such a limited number of parameters, really shouldn't be thrown into a neutral network. A decision tree (or varient) might have been the ideal ML technique, and you may have been able quickly see what parameters mattered and reduce the four parameters to code if needed.

Neural networks make sense with huge number of input parameters where feature selection is really tricky to reason about and decision boundaries are very non-linear such as image classification.

Edited: slight clarification

tluyben2 2057 days ago

When I studied AI in uni in the cold cold (cold) winter of AI and this kind of input was really significant, but most problems we consider ML now are vastly more complex and other problems can be addressed by things that are no longer considered AI at all (while they were back then).

It is funny how my uni top research (on 1m$ computers) neural nets are considered to make no sense anymore. That went a lot faster than programming.

suref 2057 days ago

There's no rule that when the number of parameters is small deep learning shouldn't be used. The one time where deep learning maybe shouldn't be attempted at all is when the number of samples is very limited. While it excels with high dimensional hierarchical data it can do well on other problems as well. It differentiates from problem to problem and usually multiple solutions are tried and compared, starting with EDA and linear regression.

teruakohatu 2057 days ago

> There's no rule that when the number of parameters is small deep learning shouldn't be used

I would be genuinely interested in examples of problems with a very low number of predictors (say two to five) when a neutral net would be appropriate (where as you say less complex methods have been tried and failed).

I just can't think of one.

Enginerrrd 2057 days ago

Suppose you have to fit a fairly non-linear curve to make interpolated predictions. A NN could do that with fewer parameters than most other models.

I can't think of a method that would use fewer parameters. If nothing else, it's a decent way to compress the data set for interpolation (on nearby averages) as a use case, no?

wenc 2057 days ago

For interpolation (just to be clear, not regression, i.e. interpolation means the curve has to pass through every point in the data exactly), polynomial interpolation gives a unique polynomial of lowest possible degree [1]; I'm not sure a NN would have fewer parameters than this for interpolation, strictly speaking.

To your point, I believe you meant "rough interpolation", and it's true in many cases NN's might produce a less overfitted approximating function if one has no prior knowledge of the generating function.

But if one can exploit prior knowledge, one can select an optimal set of basis functions and fit a more parsimonious model than a NN. For instance, if you knew that a nonlinear function was a function of sin, cos and logs, selecting these as basis functions and finding the correct functional form [2] would likely help an optimizer find more parsimonious model than a NN using standard activation functions (ReLU, sigmoid, etc). As a thought experiment, suppose the generating function was this: (5 parameters)

  y = a1*log(a2*x)/cos(a3*x) + a4*sin(a5*x)

If one attempted to fit this with log, cos and sin basis functions, one is likely recover this form with ~5 parameters. But suppose we tried to fit this with an NN with the stipulation that the approximation error is under some ε -- I suspect we'll need quite a bit more than 5 parameters.

NN's tend to generalize better (assuming proper regularization) than polynomial approximations and have fewer numerical problems like Runge's phenomenon, but I don't think NNs aim for (or have results that demonstrate) parsimony in parameters.

[1] https://en.wikipedia.org/wiki/Polynomial_interpolation

[2] If the functional form is unknown, there are techniques like "symbolic regression" that attempt to do a structure search to find a well-fitting structure. https://en.wikipedia.org/wiki/Symbolic_regression

suref 2057 days ago

For online learning, multi-output, non-negative output, unlabeled data etc neural networks works well. The power of deep learning lies in how you can shape the problem and loss function for specific purposes. And even if these circumstances do not exit they can do well, it's all problem specific.

CuriouslyC 2057 days ago

Historically the XOR function has been the simple example that many ML algorithms can't handle. Just imagine a higher dimensional XOR with outliers, and you have a pretty good use case for DL with limited predictors.

dr_zoidberg 2057 days ago

Historically, this was solved in the 80's with the multi-layer perceptron, but it seems it still gets repeated.

malux85 2057 days ago

20 Parameters is WAY too small for a deepnet. Deepnets are better suited for very high dimensional spaces where the data has sparsity, a hierarchical structure and can take advantage of nets rotational, translational invariance etc (if that architecture is used)

You cant pick completely the wrong tool, and then complain about how unsuitable it was.

ramraj07 2057 days ago

Was there any rationale to using a "deep learning model" to train a 20 parameter model? Sounds like some amateur DS convinced the team this is a good idea because he thought deep learning was cool?

m_mueller 2057 days ago

For up to 100ish parameters, even mixed with floating point I recommend trying the midaco solver a friend of mine develops. MINLP, ant colony method (i.e. gradient descent with many restarts). From my experience this runs circles around NNs for this class of problems (parameter optimization with relatively low complexity and/or limited amount of training data available).

wiz21c 2057 days ago

Yep there are well established and powerful tools for various problems : linear programming, boolean satisfiability, analytical solutions, etc. NeuralNetworks and co are for a very specific, yet large, class of problems.

fnord77 2057 days ago

wouldn't principal component analysis have done the same thing without the brute forcing?

scottlocklin 2057 days ago

The engineer wouldn't have been able to waste the week feeling fancy fiddling with GPUs in that case.

billo-ollib 2057 days ago

Since the variables are categorical, won't PCA have to be modified to use it effectively? To my knowledge, PCA can only be used for continuous variables.

shbm 2057 days ago

Yes. I think so. Or maybe some correlation plots. I doubt that a proper EDA was performed pre-modelling.

29athrowaway 2057 days ago

Have you tried training the network using dropout?

chriszhang 2057 days ago

yes, dropout and other regularization techniques were used