|
|
|
|
|
by dkislyuk
54 days ago
|
|
I agree that "it has nice derivatives" is a great empirical reason to use a specific function in ML, but it doesn't sufficiently prove that it's the best function to use. And even if a derivative term looks more complex, that doesn't necessarily imply that it is more computationally expensive to compute, so that can't be the only criteria to select a function. Luckily, there are more axiomatic reasons for why softmax is the preferred way to map inputs to a probability distribution. |
|