|
|
|
|
|
by quantadev
618 days ago
|
|
You can explain the "effect" of tanh at any level of abstraction you like, up to including describing things that happen in Semantic Space itself, but my description of what tanh is doing is 100% accurate in the context I used it. All it's doing is squashing a number down to below one. My understanding of how the Perceptron works is fully correct, and isn't missing any details. I've implemented many of them. |
|
You're curious about whether there is gain in parameterising activation functions and learning them instead, or rather, why it's not used much in practice. That's an interesting and curious academic question, and it seems like you're already experimenting with trying out your own kinds of activation functions. However, people in this thread (including myself) wanted to clarify some perceived misunderstandings you had about nonlinearities and "why" they are used in DNNs. Or how "squashing functions" is a misnomer because `g(x) = x/1000` doesn't introduce any nonlinearities. Yet you continue to fixate and double down on your knowledge of "what" a tanh is, and even that is incorrect.