Hacker News new | ask | show | jobs
by talolard 1726 days ago
I don’t know if it’s correct , but I often think of a classification model as learning the parameters of a dirchlet distribution with the final softmax layer being a sample from it