Hacker News new | ask | show | jobs
by lhnz 1156 days ago
Hm, I don't think that's quite it. I went through my own process of learning how neural networks work recently and wrote this based on my learning: https://sebinsua.com/bridging-the-gap

As far as my understanding goes, you can represent practically any function as layers of linear transformations followed by non-linear functions (e.g. `ReLU(x) = max(0, x)`). It's this sprinkling of non-linearity that allows the networks to be able to model complex functions.

However, from my perspective, the secret sauce is (1) composability and (2) differentiability. These enable the backpropagation process (which is just "the chain rule" from calculus) and this is what allows these massive mathematical expressions to learn parameters (weights and biases) that perform well.