|
|
|
|
|
by quantadev
625 days ago
|
|
Right, it's obvious that the ReLU is just a gating mechanism, and you can think of that as a decision maker. It's like a "pass thru linearly proportionally" or "block" function. But I still find it counter-intuitive that it's not common practice in standard LLM NNs to have a trainable parameter that in some way directly "tunes" whatever Activation Function is being applied on EACH output. For example I almost started experimenting with trigonometric activation functions in a custom NN where the phase angle would be adjusted, inspired by Fourier Series. I can envision a type of NN where every model "weight" is actually a frequency component, because Fourier Series can represent any arbitrary function in this way. There has of course already been similar research done by others along these lines. |
|