Hacker News new | ask | show | jobs
by uoaei 2281 days ago
This is encouraging exactly what I dislike about the current ML-in-industry space, namely the fetishism surrounding being able to describe the rote steps of an algorithm and all this "X from scratch" stuff. It's good to know "this algorithm is used for binary classification" but there are so many subtleties to how the data is reckoned with through these algorithms and how that particular representation of the problem maps onto your current business task.

For instance, I'm doing a project that involves binary classification but I already know that linear SVMs would be a terrible idea because the hinge loss only focuses on two data points and essentially ignores all the rest. Logistic regression is much more appropriate for my needs because it is directly optimizing the estimates of probability of belonging to one class or the other, by virtue of that literally being the definition of the objective function. This, though, doesn't really sink in without significant practical experience, and definitely wouldn't stick if it was recited to you from the front of a lecture hall or one of a couple hundred flash cards.

2 comments

On the job experience about which ML algorithm is more useful for a task is certainly helpful.

But, knowing how to implement "X from scratch" gives you far more useful information about how and why a particular algorithm is suitable for a certain task. We should encourage people to reimplement algorithms from scratch (for fun, not production usage) so that they understand how the sausage is made.

Also, SVMs support a whole lot of different loss functions. I suspect that a properly tuned SVM will do better than a properly tuned logistic regression.

> "VMs would be a terrible idea because the hinge loss only focuses on two data points and essentially ignores all the rest"

Not true, In my experience fitted SVMs have thousands of support vectors. The hinge loss is supposed to be less sensitive to outliers.

In general, I think SVMs are a 'terrible idea' because you can often get better fits at much faster run times with gradient boosting or you have to spend a lot of time getting the kernel just right.

I'm eschewing kernels entirely and just sticking to linear models for reasons around interpretability (need to convert the model's coefficients+intercept to an explicit Boolean statement). But you're right that kernel methods are more flexible (maybe too flexible).