| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dkislyuk 54 days ago
	I agree that "it has nice derivatives" is a great empirical reason to use a specific function in ML, but it doesn't sufficiently prove that it's the best function to use. And even if a derivative term looks more complex, that doesn't necessarily imply that it is more computationally expensive to compute, so that can't be the only criteria to select a function. Luckily, there are more axiomatic reasons for why softmax is the preferred way to map inputs to a probability distribution.