|
|
|
|
|
by quantombone
2720 days ago
|
|
The hinge-loss and the primal form of the SVM objective is really easy to understand. Every ML 101 class would jump into the dual formulation, talk about kernels, RKHS, and all the fancy stuff. Once you realize that a linear SVM isn’t very different from logistic regression, it starts to all make sense (at least it did for me). Key insight of the hinge-loss: once something is classified correctly beyond the margin, it incurs a loss of zero. Now, Something fun to think about. Draw the hinge loss. Now draw the ReLU (which is found all over the place in CNNs). Now thing about L1-regularization (which was used to induce sparsity in compressed sensing). They are more similar in form than you would think. |
|
Some people have had good luck with hinge or multi-hinge loss for neural networks instead of the almost universal log loss, since of course the hinge loss can be used in things other than linear models. It doesn't care how you get the y output.