|
|
|
|
|
by RandyRanderson
2769 days ago
|
|
FF NNs of even one hidden layer are universal approximators. That is, they do find the global min. What this doesn't tell you is that it's likely a huge graph and will take a looong time to optimize for even trivial data sets. There's lots of proofs around. That's why SGD is used, and for only a small subset of training points at a time. Re 2: No. Re 3: Yes. |
|