Hacker News new | ask | show | jobs
by joshhart 5353 days ago
Boosting, which he mentioned, is an ensemble method so I assume the parent is familiar with them.

Ensemble methods incorporate multiple weak classifiers and work to make them stronger. I think the parent was thinking of the reverse of this, although that idea seems pretty alien to me.

2 comments

Yes, I'm familiar with ensemble methods, I use them a lot for classification. But those are not really what I'm thinking about (I'm still groping towards concrete ideas here, so forgive me if the following is a bit vague). Perhaps my saying "the reverse of boosting" is not really an accurate way to put this, in retrospect, so let me clarify.

Ensemble methods typically take several distinct (either by method or training) weak learners and combine the predictions to get one strong hybrid by smoothing, averaging, or otherwise combining the results. They are still vulnerable to overtraining, though, and they're not very good at generalizing from small amounts of data because the individual weak learners don't learn from each other or from context.

My theory is that we might be able to get rid of the ensemble and tolerate massive overtraining without detriment if instead of merely combining results, we took a recursive approach and let the classifier use its output as input at another level. My thought is that overtraining on some patterns could be mollified by the ability to recognize error due to overtraining as a pattern at a different depth of recursion.

This obviously would not be generally applicable to weak learners, it would only apply to a particular subset of learners, and that's where my thoughts get a lot muddier and speculative.

My really wild speculation: in the limit, if you set something like this up in the right way, you might be able to come up with an efficient approximation to Solomonoff induction as restricted to the subset of patterns that you're actually exposed to, rather than over the entire set of possible inputs. If I'm correct about that, it would enable staggeringly effective learning within a domain, as long as the domain itself displayed patterns that had some sort of underlying order.

But I don't have any codez to show, or really anything more than a hunch at this point, so don't take me too seriously. :)

Indeed. The closest I can think of to what he is saying is pareto coevolution