Hacker News new | ask | show | jobs
by andrewmatte 2427 days ago
Yes and the bait and switch from business for flowers?

And K-means??? Why not HDBSCAN?

4 comments

Each time somebody points out K-means, I show them this clustering benchmark by the scikit-learn project: https://scikit-learn.org/stable/_images/sphx_glr_plot_cluste...
Wow, thanks so much for that. I was trying to figure out how to do clustering for geographic place names (from AIS data) and that one image answers so many questions for me.
I printed this out and put it on the wall by my desk a while back because of the number of questions people were asking me about various clustering algorithms.
Any accompanying text for the image?
Anyone willing to describe us the importance of this image? I'd like to be enlightened.
Really depends on your data and what clustering you want. There isn't one "best" clustering algo. Sometimes you really DO want partitioning, and KMeans works better. Sometimes it's agglomerative for connecting thin threads. What I've found is that HDBScan is too conservative in clusters. It's usually just running the data through numerous models and seeing which are the most stable after parameter tuning, and what is usable by marketing.
Just read this today! Also this lib is from the maker of the amazing UMAP dimension reduction lib.

https://hdbscan.readthedocs.io/en/latest/performance_and_sca...

Exactly! At least with an actual customer dataset,that it implied at first that it was going to use, it would have been slightly useful.