| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andrewmatte 2427 days ago
	Yes and the bait and switch from business for flowers? And K-means??? Why not HDBSCAN?

4 comments

Buetol 2427 days ago

Each time somebody points out K-means, I show them this clustering benchmark by the scikit-learn project: https://scikit-learn.org/stable/_images/sphx_glr_plot_cluste...

link

wpietri 2427 days ago

Wow, thanks so much for that. I was trying to figure out how to do clustering for geographic place names (from AIS data) and that one image answers so many questions for me.

link

Godel_unicode 2427 days ago

I printed this out and put it on the wall by my desk a while back because of the number of questions people were asking me about various clustering algorithms.

link

vasili111 2427 days ago

Any accompanying text for the image?

link

Buetol 2427 days ago

Yes, here's the context: https://scikit-learn.org/stable/modules/clustering.html

link

carokann 2427 days ago

Anyone willing to describe us the importance of this image? I'd like to be enlightened.

link

starpilot 2427 days ago

Really depends on your data and what clustering you want. There isn't one "best" clustering algo. Sometimes you really DO want partitioning, and KMeans works better. Sometimes it's agglomerative for connecting thin threads. What I've found is that HDBScan is too conservative in clusters. It's usually just running the data through numerous models and seeing which are the most stable after parameter tuning, and what is usable by marketing.

link

lootsauce 2427 days ago

Just read this today! Also this lib is from the maker of the amazing UMAP dimension reduction lib.

https://hdbscan.readthedocs.io/en/latest/performance_and_sca...

link

lawlorino 2427 days ago

Exactly! At least with an actual customer dataset,that it implied at first that it was going to use, it would have been slightly useful.

link