| >> But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space. That is really not the case. Neural nets generalise very poorly, hence the need for ever larger amounts of data: to overcome their lack of generalisation by attempting to cover as many "cases" as possible. Edit: when this subject comes up I cite the following article, by François Chollet, maintainer of Keras: The limitations of deep learning https://blog.keras.io/the-limitations-of-deep-learning.html I quote from the article: This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space. |
>the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time
In my experience that isn't really true, if you have an appropriately designed net, training data which appropriately samples the problem space, and the net is not overtrained (overfit).
You can think of training data as representing points in high dimensional space. Like any interpolation problem, if you sample the space with the right density, you can get accurate interpolation results - and neural nets have another huge advantage, in that they learn highly nonlinear interpolation in these high d spaces. So the net may be unlikely to generalize to points outside of the sampled space - although now that I think of it I'm not sure of how nets handle extrapolation - but when you're dealing with space with thousands of dimensions (like each pixel in an image) you can still derive a ton of utility from the interpolation which effectively replaces hardcoded rules about the problem you're solving.