|
|
|
|
|
by nullc
4400 days ago
|
|
Another limit they don't address is that the training normally used is purely local— just a gradient descent. So even when the network can model your function well, there is no guarantee that it will find the solution. For me ANN's always seem to get stuck on not very helpful local minima— they're not one of the first tools in my bags of tricks by far. Often I associate them as being the sort of thing that someone who doesn't really know what they're talking about talks about. (Esp. if its clear that in their minds NN have magical powers. :) maybe they'll also mention something about "genetic algorithms") |
|
If it models the function over the input domain, then it is properly trained. If it is trained to a local minima then it doesn't model the underlying function well over the whole input domain. If you have good/representative training and validation sets you will be able to tell.
> Esp. if its clear that in their minds NN have magical powers
I know that type. When dealing with ANN's you realize quickly (just like in all data science) that all of the "magic" relies on the manual work and thought that goes into washing and adapting the data. Not very sexy work, and work that requires a fair bit of knowledge about the problem domain.
> For me ANN's always seem to get stuck on not very helpful local minima
That isn't the ANN that gets stuck, it's the training algorithm (using gradient descent) that gets stuck :) Training is orthogonal to the operation of the network itself (which is just a nonlinear function in the end!). Gradient descent via error backpropagation is the most common training method for MLP's, but you could imagine doing a random/brute force algorithm that is significantly simpler to implement, but slower. Since a network is often trained once and then used repeatedly, it is often plausible to train it for several weeks if needed! A pure random search is usually not feasible, but adding randomization to a gradient descent will help. There are many ways to avoid local minima for a gradient desccent, if you have time to wait.
> maybe they'll also mention something about "genetic algorithms"
The simple error backpropagation methods only work well for normal feed-forward networks. Other topologies e.g. recurrent networks require more exotic methods. In my (limited) experience genetic algorithms are rarely efficient as a training method though.