Hacker News new | ask | show | jobs
by SeanLuke 5019 days ago
All machine learning techniques build models for you.

ADTrees, Random Forests, and Decision trees are all classifiers, so they serve this guy's needs. Genetic Programming is not a machine learning technique: it is an optimization method for a very specific task and is quite unsuited for his purposes.

3 comments

No, I am certain you are confusing Genetic Programming [1] for Genetic Algorithms [2]. I am a fan (do a lot with them) of the prior but have never really been impressed by the latter.

Wikipedia says Genetic Programming is a specialization of genetic algorithms where the objects are programs. I think this is the only thing I have ever seen where the specialization turns out to be more general than the parent.

I'd also like to point out that I although all ML learns models, all those I gave can learn stuff as what amounts to a bunch of IF-THEN statements. Well GP need not be so limited. And Random Forests lose the trees in the forest (harder to interpret) but yeah, that's why I formed that particular list.

[1] http://en.wikipedia.org/wiki/Genetic_programming

[2] http://en.wikipedia.org/wiki/Genetic_algorithm

> No, I am certain you are confusing Genetic Programming [1] for Genetic Algorithms [2].

Trust me. I am not.

Then you will know that Genetic Programming can be used for classification, structured learning, symbolic regression and even meta learning [1]. For classification you can build a program that searches for a program that best uses the input to predicts classes. I have done this (there are better heuristics than the old kind that lead to really big dumb programs). Or you can combine boosting with Genetic Programming or you can use them in Learning Classifier Systems. All these lead to classification based on genetic programming.

[1] http://en.wikipedia.org/wiki/Eurisko

As an aside, My definition of Machine learning is more inclusive than yours. I mean if you are going to separate out optimization then I guess you don't count stochastic gradient descent or Matrix factorization as part of machine learning? Machine learning is basically the combination of statistics and optimization where you can work with a lot of data and the output of your computation is more important than the model.

"Wikipedia says Genetic Programming is a specialization of genetic algorithms where the objects are programs. I think this is the only thing I have ever seen where the specialization turns out to be more general than the parent."

I think GP is "not only" for evolving assembly programs -- http://www.genetic-programming.org/combined.html

Can you explain you think Genetic Programming is unsuitable? I honestly can't see why you think it wouldn't work.
Obviously various optimization techniques can be used essentially as poor-man's machine learners. That doesn't mean they're particularly good at it.

In this case, Genetic Programming uses trees as its representation, and so it could be used as a tree-based classifier, but the OP has already indicated that he expects enormous numbers of rules, necessitating huge trees. For Genetic Programming, this translates into a gigantic search space.

As it was my thesis work, I'm always ready to promote Genetic Programming where appropriate. This is not the place. There are better tools for this task.

I do not think that he knew thousands of if-then rules would be needed, likely his estimate.

But that aside I believe that such a search space could be handled by GP. That very problem is what I've been tackling for sometime now. Its position on my list reflects that particular bias. But there are techniques. I use EDA's like MOSES and was heavily inspired by http://www-ia.hiof.no/~rolando/ but have my own original contributions. The method is also interactive for tough problems, these really help in reducing the search space. Although I have not succeeded in a use yet, I have been thinking hard about how category theory could be used to constrain space or perform particular automatic transofrmations (https://www.cs.ox.ac.uk/jeremy.gibbons/publications/origami...., https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/a...).

As SeanLuke said, genetic programming is an optimization method. I don't see a function being optimized here, so what would you run it on?
If you had samples of cow behavior such as videos, you could then simulate behavior at each step and optimize for a genetic algorithm that produces similar behaviors. In this case it's probably not the right solution, but it's not that hard to swap out different learning algorithms when things don't seem to jell.
Maybe, but classification and optimization are two different problems. There's no function for the poster to optimize here.
I have already explained in the post above why optimization is very a much a part of machine learning. Now, you say that that classification and optimization are two different things and that is true. But really, it can be more fruitful to look at supervised learning as a special case of unsupervised learning [1]. It is best to seek the most general framework from which to understand things as it leads to a deeper understanding, broader applicability of concepts and easier cross fertilization across fields.

For example, understanding the spectral theorem makes SVD (hence PCA) and the DFT class of algorithms much clearer. Understand the notion of Lp-Norms, convexity, adjoints, loss functions and regularization and a whole bunch of seemingly different algorithms collapse into facets of the same thing. Hook it up to automatic differentiation then some optimization algorithms and you can write anything from Neural networks, SVMs, regularized logistic regression to Non negative tensor factorization in a few lines. You stop making arbitrary divisions between classification or optimization. Much the same kind of collapse can be done for the dual [2] notion of probabilistic algorithms by thinking in terms of graphs, simplices, parametrizations, families and conjugacy.

The best thing from all this is you stop thinking of which algorithm should I use and start thinking of what do I want to do? What is the best mathematical model for this? What would really be great would be a machine learning language. Where one could work with things akin to folds and maps on various structures and manifolds and disappear the incidental complexity. Stuff like [3] is really encouraging for that direction.

[1] The problem of learning a distribution usually is called unsupervised learning, but in this case, supervised learning formally is a special case of unsupervised learning; if we admit that all the functional relations or associations that we are trying to learn have any element of noise or stochasticity, then this connection between supervised and unsupervised problems is quite general.

http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf

[2] http://golem.ph.utexas.edu/category/2007/01/duality_between_...

[3] http://www.ipam.ucla.edu/publications/gss2012/gss2012_10605....

I agree with everything you said. I never said optimization isn't a part of ML, it was very much a part of my ML masters, in fact. I was just saying that, in this case, the OP doesn't have an explicit function to optimize, hence why he didn't need optimization...
For all we know he could have produced the thousand ifs from the solution of a decision tree.