Hacker News new | ask | show | jobs
by falcor84 583 days ago
> It’s like chaining up perceptrons hoping to get more expressive power for free.

Isn't that literally the cause of the success of deep learning? It's not quite "free", but as I understand it, the big breakthrough of AlexNet (and much of what came after) was that running a larger CNN on a larger dataset allowed the model to be so much more effective without any big changes in architecture.

1 comments

Without a non-linear activation function, chaining perceptrons together is equivalent to one large perceptron.
Yep. falcor84: you’re thinking of the so-called ‘multilayer perceptron’ which is basically an archaic name for a (densely connected?) neural network. I was referring to traditional perceptrons.
While ReLU is relatively new, AI researchers have been aware of the need for nonlinear activation functions and building multilayer perceptrons with them since the late 1960s, so I had assumed that's what you meant.
It was a deliberately historical example.