Hacker News new | ask | show | jobs
by RandyRanderson 2769 days ago
FF NNs of even one hidden layer are universal approximators. That is, they do find the global min. What this doesn't tell you is that it's likely a huge graph and will take a looong time to optimize for even trivial data sets. There's lots of proofs around. That's why SGD is used, and for only a small subset of training points at a time.

Re 2: No.

Re 3: Yes.

2 comments

They are universal approximators, although the term approximator means there's an upper bound on the accuracy of how well they emulate a given function (based on the size of the network). However, just because they are universal approximators doesn't mean that you can automatically infer the optimal number of connections and weight for each of those connections in order to minimize the loss over some dataset (derived from some function). Being able to be a universal approximator does not imply you can automatically learn the best approximation of a given function. It's the difference between being capable of learning something, and having learned it. If that makes sense.
I dont think you understand what universal approximation means. It means there are parameter settings that would reduce the approximation as much as you want. Its an existential property. It does not mean that those parameters can be found.

Anyway this universal property of neural networks get a lot of airtime and people go gaga over it. Its a complete red herring. Its not the first example of a universal approximation and not the last. There is no scarcity of universal approximators. There was no such scarcity even 100s of years ago. The explanation of the success of DNNs lie elsewhere.