|
|
|
|
|
by thaumasiotes
51 days ago
|
|
> The stated problem of mapping raw inputs/scores/logits to a probability distribution can be solved by a bunch of arbitrary functions, and the usual justification given for a softmax is "it has nice derivatives" which is empirically useful but not satisfying. Often there isn't any more to it than that. For example, the entire justification for least-squares error measurement is that it has convenient derivatives. |
|