| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by YeGoblynQueenne 2347 days ago

>> But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space.

That is really not the case. Neural nets generalise very poorly, hence the need for ever larger amounts of data: to overcome their lack of generalisation by attempting to cover as many "cases" as possible.

Edit: when this subject comes up I cite the following article, by François Chollet, maintainer of Keras:

The limitations of deep learning

https://blog.keras.io/the-limitations-of-deep-learning.html

I quote from the article:

This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space.

2 comments

allovernow 2346 days ago

Well...I think that take is a little overly cynical, and I disagree particularly with this:

>the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time

In my experience that isn't really true, if you have an appropriately designed net, training data which appropriately samples the problem space, and the net is not overtrained (overfit).

You can think of training data as representing points in high dimensional space. Like any interpolation problem, if you sample the space with the right density, you can get accurate interpolation results - and neural nets have another huge advantage, in that they learn highly nonlinear interpolation in these high d spaces. So the net may be unlikely to generalize to points outside of the sampled space - although now that I think of it I'm not sure of how nets handle extrapolation - but when you're dealing with space with thousands of dimensions (like each pixel in an image) you can still derive a ton of utility from the interpolation which effectively replaces hardcoded rules about the problem you're solving.

link

YeGoblynQueenne 2346 days ago

I may be jumping the gun a little because I was thinking about this in the context of another thread, but a practical problem with machine learning in general is that, for a learned model to generalise well to unseen data, the training dataset (all the data that you have available, regardless of how you partition it to training, testing and validation) must be drawn from the same distribution as the "real world" data.

The actual problem is that this is very difficult, if not impossible, to know before training begins. Most of the time, the best that can be achieved is to train a model on whatever data you have and then painstakingly test it at length and at some cost, on the real-world inputs the trained model has to operate on.

Basically, it's very hard to know your sampling error.

Regarding interpolation and dense sampling etc, the larger the dimensionality of the problem the harder it gets to ensure your data is "dense", let alone that it covers an adequate region of the instance space. For example, the pixels in one image are a tiny, tiny subset of all pixels in all possible images- which is what you really want to represent. Come to that, the pixels in many hundred thousands of images are still a tiny, tiny subset of all pixels in all possible images. I find Chollet's criticism not cynical, but pragmatic and very useful. It's important to understand the limitations of whatever tool you're using.

>> although now that I think of it I'm not sure of how nets handle extrapolation

They don't. It's the gradient optimisation. Gets stuck to local minima, always has, always will. Maybe a new training method will come along at some point. Until then don't expect exrapolation.

link

richk449 2347 days ago

It doesn’t need to generalize, just do sophisticated interpolation.

Basing the results on a dense sampling of the input space is exactly what I was suggesting.

link

YeGoblynQueenne 2346 days ago

Apologies for the misunderstanding. You said "generalizing from a partial sample of the problem space" and I thought you meant generalisation to unseen data from few examples, which is generally what we would all like to get from machine learnig models (but don't).

But, if a neural net can't _extrapolate_ to unseen instances, I don't see how it can solve problems like the one you describe with any useful precision, again unless it's trained with gigantic amounts of examples (which you say is not required). And how is this reducing computational costs with respect to hand-coded solvers?

link

richk449 2346 days ago

To be clear - I have absolutely no experience in this domain. I'm just speculating.

In the example I gave, everyone agrees that if you had long enough and enough processing power, you could solve every possible configuration, and store the results. Then you could instantaneously "solve" any problem.

Unfortunately, the problem I describe is a toy problem (too simple to be useful), and yet it would still take way way too long to solve all the possible configurations.

What if you solved some tiny fraction of the configurations though? That would be a sampling of the configuration space. Then a neural network could use that sampling to interpolate to the cases not solved. That would provide a significant speedup over actually solving the problem.

So the real question is what density you need to pre-solve the configuration space to make it work? It definitely depends on what accuracy you need in the solution, as well as how good you can do with the interpolation. If I said previously that gigantic numbers of examples are not needed, then I misspoke. I am sure they would be needed. Gigantic is vague though - is it the kind of number that can be rented from AWS, or is it the kind of number that would require civilization resources?

I have no idea if the math actually works out to make it a useful approach. All I am saying is that conceptually I can see that in some cases, it could be possible.

link

YeGoblynQueenne 2345 days ago

>> So the real question is what density you need to pre-solve the configuration space to make it work?

Yes, that's the main question. I don't know the answer of course but if we're talking about an engineering problem where precision is required, intuitively the more the merrier.

The thing is, with neural nets you can do lots of things in principle and many things "in the lab". When you try to take them in the real world is the tricky bit. Anyway, another poster here is saying we'll see big things in the next five years so let's hold on to our hats for now.

link