Hacker News new | ask | show | jobs
by forgot-im-old 732 days ago
It's not clear that a bunch of cascaded rectified linear functions will every generalize to near 100%. The error floor is at a dangerous level regardless of training. AGI is needed to tackle the final 1%>
1 comments

The universal approximation theorem disagrees. The question is how large the network should be and how much training data it needs. And for now it can only be tested experimentally.
The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.
There's no theorem that SGD is insufficient. So, as I said, it's empirical.
You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.