Hacker News new | ask | show | jobs
by azakai 5014 days ago
> Neural Networks are gradually taking over from simpler Machine Learning methods

And haven't SVMs and such gradually taken over from Neural Networks?

6 comments

And RandomForests taken over from SVMs ;)

In seriousness when you look around at what's happening both in practice and in academia I would say RandomForests/SVM/Neural Networks all stand pretty equally and have different strengths. If you've just got rows and rows of data with numeric, categorical and missing values it's hard to beat the speed and quality of shoving it in a RandomForest. However to my knowledge SVMs are still better at solving NLP categorization tasks and handling sparse, high dimensional data. And Neural Networks always seem to be popping up solving very weird and/or hard problems.

Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

  1) a convex problem which means a unique solution and a lot of already existing technology can be used
  2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations
  3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)
There is an ongoing craze about deep belief networks developed by Hinton (who is teaching this course) who came up with an algorithm that can train them (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

  1) They seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.
  2) They can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlaballed datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.
 3) They're "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.
Enlightening response, could you please post links to papers that explain online training of SVM?

Also, I found this paper [1] on unsupervised feature detection, if you have some additional material, I'll really appreciate if you could post it!

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44....

Here's the problem: There is no silver bullet in Machine Learning and many of these approaches (SVMs, Neural Nets, Random Forests, PGMs, etc.) have their pros and cons that depend on many variables, for example:

- How much data do you have wrt dimensionality?

- How "easy" do you suspect your problem to be? Is it likely linearly separable? Equivalently, how good are your features?

- Do you have many mixed data? Missing data? Categorical/Binary data mixed in? (Better use Forest, perhaps!)

- Do you need training to be very fast?

- Do you need testing to be very fast on new out of sample data?

- Do you need a space-efficient implementation?

- Would you prefer a fixed-size (parametric) model?

- Do you want to train the algorithm online as the data "streams" in?

- Do you want confidences or probabilities about your final predictions?

- How interpretable do you want your final model to be?

etc. etc. etc. Therefore, it doesn't make any sense to talk about one method being better than another.

One thing I will say is that, as far as I am aware, Neural Nets have a fair amount of success in academia (which should be taken with a grain of salt!), but I haven't seen them win too many Kaggle competitions, or other similar real-world problems. SVMs or Random Forests have largely become the weapon of choice here.

Neural Nets do happen to be very good when you have a LOT of data in relatively low-dimensional spaces. Many tasks, such as word recognition in audio or aspects of vision fall into this category and Google/Microsoft and others have incorporated them into their pipelines (which is much more revealing than a few papers showing higher bars for Neural Networks). In these scenarios, Neural nets will parametrically "memorize" the right answers for all inputs, so you don't have to keep the original data around, only the weighted connections.

Anyway, I wrote a smaller (and related) rant on this topic on G+: https://plus.google.com/100209651993563042175/posts/4FtyNBN5...

The way it's worded is not 100% clear. Hinton, who is an excellent lecturer and explainer, is talking about neural nets trained with "deep learning" techniques (not vanilla single-hidden-layer nets), which have had striking success at hard vision problems that have been difficult to solve top-to-bottom with SVMs (e.g., you could get good performance from an SVM, but you'd have to go on a hunt for good low-level features first).

That said, there is a rather unhelpful herd mentality in the field, with people moving from one Next Big Thing to another, disparaging the previous Big Thing along the way.

Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

1) a convex problem which means a unique solution and a lot of already existing technology can be used

2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations

3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)

There is an ongoing craze about deep belief networks developed by Hinton et al. (who is teaching this course) who came up with an algorithm that can train them reasonably well (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

1) they seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.

2) DBNs can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlabelled datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.

3) DBS are "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.

"SVMs. . .3)can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)"

Please do. I want to read some about SVMs since i haven't heard that much about them.

I am not an expert in SVMs, but I consider myself fairly experienced in machine learning. In my professional experience the answer to your question is 'not quite'. SVMs have solved some problems very well, but I've had issues with them:

1. They are only for classification, not every problem is classification. The other big category is regression, for example predicting the sale price of a home rather than predicting a binary "will it sell"

2. They don't have a natural probabilistic interpretation for classification. Neural networks for classification (with a logistic activation function) are trained to predict a probability, not make a simple binary decision. In practice this probability is usually very useful, although I believe SVMs have been modified to give some kind of probability.

3. I have had a tough time getting them to run quickly. Linear kernel SVMs are fast, but aren't powerful. More complex kernels are more powerful but can be very slow on moderately large datasets.

SVMs are very much used for regression as well:

http://scikit-learn.org/stable/modules/svm.html#regression

Note: the scikit-learn implementation of SVMs is based on libsvm:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Interesting, a quick glance at a paper on SVRs indicate they kind of work in the opposite manner of a SVM - in an SVM you try to maximize the number of points far away from the separator (taking into account class), whereas in regression you are trying to minimize this.

Do you have much background using them? I'm curious how they perform on real-world tasks.

Yeah, there's the SVR "pipe" concept, where you attempt to fit the margin s.t. points are close to it. It's a great alternate use of SVM's obj. function optimization.

I haven't really used SVRs aside from some exploratory work, so I can't speak too much about them. But I know they exist!

for 1. you can definitely modify an SVM to be used for regression, as far as I know most standard SVM libraries have support for regression, and I have personally used them very successfully for this task. [0]

2. There are actually ways you can modify the output of an SVM to give a probabilistic interpretation[1]. But I'll agree with the not having a 'natural' probabilistic interpretation.

3. Is definitely correct, but I'm not sure NNs are that much better.

[0] http://www.svms.org/regression/

[1] http://www.cs.colorado.edu/~mozer/Teaching/syllabi/6622/pape...