| HN Mirror

thanks for the comment.

Yes the performance boost in training is critical.

So what I'm saying is this performance boost is thanks to the switch to non-binary neuron (which allows a reversible operation to transmit a magnitude) - that's MOST important.

And separately ReLu are just better at this, because they are linear they don't have "vanishing edges" (which prevents the vanish/explode)

Separately I'm glad you brought up CNNs because CNN's are old, and go back to rosenblatt (1958), his perception had a first layer of local connections in it based on the findings in biological systems.

and of course that's because nature has found it's more efficient, and so the efficiency is huge.

but the point is CNN = fewer knobs to train.

and there are lots of simple ways to help reduce knobs - fix some connections - drop some connections