|
|
|
|
|
by p1esk
2824 days ago
|
|
It's not about normalization, it's about loss function. Softmax is required by cross entropy minimization (negative log-likelihood to be precise), which works somewhat better in practice than mean squared error (MSE) minimization (which needs no normalization of outputs). |
|