Hacker News new | ask | show | jobs
by nikolayasdf123 318 days ago
I misread this as if "there is no non-linearity". there is still non-linearity, it is just renamed and reshuffled into new operators. basically renaming apples into oranges.
2 comments

Well, it's more like fruits and vegetables. The author proposed a normalized inner product as replacement for the standard inner product.

It's not an activation function, because it has the learnable weights of a linear projection (mat vec multiplication) and the clamping properties of an activation function all in one.

My personal issue with the proposal is that it essentially doubles the amount of memory needed on-chip.

Yat-Product GEMMV now needs to store the running total of the inner product and the norm of the input vectors. That's a big cost increase for something that might not improve performance all that much.

that's a great point you made, but the goal of this research paper is not to improve the performance, but to show that you can train deep neural networks without the need of activation functions, normalization layers, deep neural networks.

one simple usecase for them is physics-informed neural networks and neural ODEs, where using activation functions is discouraged, mainly because they aren't infinitly differentiable, and they use the tanh or the sin most of the time, this kernel i introduced works better then the neurons followed with a tanh to solve different PDEs

basicly the real "non-linearity" in deep learning have always been the orthogonality, squashing functions make it easy for the neurons to tap into the orthogonality, while most of the activation functions "lie" about their orthogonality by setting the dot product score to "0", and a dot product of 0 between two vectors means they are orthogonal (linear indep)

what i did was rely on both the angular information and spatial information between the input x and the weight w to measure how "similar" they are.

the lower bound of the yat-product is 0, and it is achieved only when two vectors are orthogonal and away