Hacker News new | ask | show | jobs
by FreakLegion 2974 days ago
For optimizations, have a look at section 3 of the XGBoost paper: https://arxiv.org/pdf/1603.02754.pdf. LightGBM has similar features (e.g. binning: http://lightgbm.readthedocs.io/en/latest/Parameters.html#io-...).

Scikit is shockingly slow, in comparison. Also bloated, but that's more a matter of 1) not having a "release" impl that ditches data only useful for debugging, and 2) using 64-bit data types all over the place, despite running in parallel arrays! (https://github.com/scikit-learn/scikit-learn/blob/master/skl...)