Hacker News new | ask | show | jobs
by p1esk 2824 days ago
It's not about normalization, it's about loss function. Softmax is required by cross entropy minimization (negative log-likelihood to be precise), which works somewhat better in practice than mean squared error (MSE) minimization (which needs no normalization of outputs).