Hacker News new | ask | show | jobs
by allthing 2612 days ago
I believe the universal approximation theorem is for a single hidden layer. When more layers are added arbitrary functions can be approximated.

From section 4.6.2 of Tom Mitchell's Machine Learning book: "Arbitrary functions. Any function can be approximated to arbitrary accuracy by a network with three layers of units (Cybenko 1988)."