Hacker News new | ask | show | jobs
by apstroll 521 days ago
Under a crossentropy loss the output activations do absolutely represent a probability distribution, since that is what we're modeling.