Hacker News new | ask | show | jobs
by sgt101 2044 days ago
An alternative is to re-lable data with the ensemble's outputs and then learn a decision tree over that.
2 comments

That works to a point, but it doesn't necessarily find all the rules of the model. In the post I walked through a model with three training records (yellow, blue, red) which created six prediction boundaries. Half of the rules weren't covered by the training data, which makes them hard to find without an efficient algorithm to search out all possible rules. The risk of undiscovered rules is they may cause unexpected behaviour that leads to bad predictions - and if you haven't described the whole model then it will be impossible to know how many of these potentially bad predictions exist.
Do you have any references/explainers for that approach? Would be interested to read!
The best I have is this one I wrote a long time ago:

https://www.aaai.org/Papers/Workshops/1999/WS-99-06/WS99-06-...

But, I apologize ! It's a bit pimped up compared to the one liner above, I think step 7 in section 4.3 is what I was thinking of :) I did laugh when I dug it out, as I have been working on the first bullet in the conclusion this week!

Check out this work from Rich Caruana & collaborators on model compression: http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf

which was a precursor to the model distallation work from Geoff Hinton: https://arxiv.org/abs/1503.02531

I've made some experiments that you can check out here:

http://www.clungu.com/Distilling_a_Random_Forest_with_a_sing...