| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by infimum 1781 days ago
	scikit-learn (next to numpy) is the one library I use in every single project at work. Every time I consider switching away from python I am faced with the fact that I'd lose access to this workhorse of a library. Of course it's not all sunshine and rainbows - I had my fair share of rummaging through its internals - but its API design is a de-facto standard for a reason. My only recurring gripe is that the serialization story (basically just pickling everything) is not optimal.

3 comments

CapmCrackaWaka 1781 days ago

I recently ran into this issue as well. Serialization of sklearn random forests results in absolutely massive files. I had to switch to lightgbm, which is 100x faster to load from a save file and about 20x smaller.

link

kzrdude 1781 days ago

What's a typical task you do with sklearn? Just trying to get inspired about what it can do

link

zeec123 1781 days ago

There is so much wrong with the api design of sklearn (how can one think "predict_proba" is a good function name?). I can understand this, since most of it was probably written by PhD students without the time and expertise to come up with a proper api; many of them without a CS background.[1]

[1] https://www.reddit.com/r/haskell/comments/7brsuu/machine_lea...

link

kzrdude 1769 days ago

These seem like minor gripes (reading your link) - and I don't even agree with them, seems like an ok use of mutable state (otherwise a separate object would be needed for hyperparameter state?). Maybe my expectations are low, but they way sklearn unifies the API across different estimators all across the library - that's already way above what you can expect - especially if you consider it to be "written by a bunch of phd students".

link

mrtranscendence 1781 days ago

I didn't want to bag on sklearn (I've already bagged on pandas enough here), but for what it's worth I agree with you. It's, ahh, not the API I would've come up with. It's what everybody has standardized on, though, and maybe there's some value in that.

link