Hacker News new | ask | show | jobs
by dahart 3191 days ago
It's super interesting to think that any non-linearity at all can make it work. This particular non-linearity is surprising since it's clamping to zero at the center of the response curve. I'd have thought that's right where you want the linear response, and that clamping in the middle would cause bad things to happen. Sigmoid and RelU (and others) clamp at the foot/shoulder. Perhaps this network just learns negative weights, compared to the traditional activation functions??
1 comments

there's a theorem that any nonlinearity works (for sufficiently sized networks).
The universal approximation theorem actually assumes that the nonlinearity is monotonically increasing, nonconstant and continuous. I don't think floating point nonlinearities technically satisfy that.
0) nonconstant. Yes, for most cases the floating point nonlinearities map x => x, so it is not a constant.

1) bounded. Yes, the nonlinearites are bounded by the range of the FP.

2) monotonically-increasing. Yes. Consider a + b, where fp(a + b) < a + b, in other words, it's been rounded down. examine fp(a + (b - db)), cannot be rounded up to a number higher than fp(a + b), so the the floating point rounding functional fp must be monotonic for the operation +, a similar argument applies for multiply, and thus for any linear function.

3) continuous function. No. Well, you can't win at everything, no computer representation can be truly continuous, but it's reasonable approximation of the approximation theory, otherwise ML on computers in general would be hopeless.