| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway080383 2827 days ago
	Coming from pure math, I often feel this way now learning statistics and ML. In pure math, it feels like the threshold for how a novel a concept should be before it gets its own word is much higher. E.g, we have "regression" and "classification" instead of "supervised continuous prediction" and "supervised discrete prediction".

1 comments

p1esk 2827 days ago

If you don't undestand where the name "softmax" came from, you don't really understand what it is. Softmax is a differentiable approximation of the max function.

Plot max(0, x) and softmax(0, x) functions, and it should become clear.

link

throwaway080383 2826 days ago

Nit: it seems it's more like a smooth approximation to maxarg than max.

Yeah it makes sense that this is a super important function, but I still feel like one could just remember the principle that "exponentiation followed by normalization is a smooth approximation to maxarg."

link

p1esk 2826 days ago

Basic building blocks of most deep learning models are convolutional layer, pooling layer, fully connected layer, and softmax layer. How do you propose we call "softmax layer" instead?

link

TeMPOraL 2826 days ago

Normalization layer?

This opens up possibility of using something else than softmax in there.

link

p1esk 2826 days ago

Well, there are other building blocks, such as batch normalization layer, or local contrast normalization layer (not to mention a dozen of batchnorm alternatives, e.g. group normalization, weight normalization, layer normalization, instance normalization, etc).

If you just say "normalization layer" how am I supposed to know which normalization you're talking about?

link