Hacker News new | ask | show | jobs
by czr 2874 days ago
Echoing the other respondents–if you don't have a nonlinearity, your whole network is just a sequence of linear transforms, which (multiplied out) is the same as a single linear transform. Meaning that removing the nonlinearities gives you (effectively) a one-layer network.