| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mattnedrich 3700 days ago

I wrote an article about mean-shift a while back if anyone is interested in more details about it - https://spin.atomicobject.com/2015/05/26/mean-shift-clusteri...

Some comments on K-Means - one large limitation of K-Means is that it assumes spherical shaped clusters. It will fail terribly for any other cluster shape.

It's interesting that the author compared results on the same data set for the different algorithms. Each clustering approach is going to work best on a specific type of data set. It would be interesting to compare them across several different data sets to get a better feel for strengths/weaknesses, etc.

2 comments

lmcinnes 3700 days ago

I agree that on some level more data sets would be nice, but I felt that it cluttered and obscured the exposition. Instead I used the one synthetic dataset, but crafted in to have various properties (noise, cluster shape, variable density, non-standard distributions) that will confound many different clustering approaches ... it is meant to be the "hard" case that with all the difficulties and confounding factors rolled into one dataset.

link

mattnedrich 3700 days ago

Cool, I think you did a great job. Do you have run time data for each algorithm on that data set?

link

lmcinnes 3700 days ago

It's included in the upper left corner of the plots. To be fair, these are for the sklearn implementations, some of which are excellent, but I can't speak for the performance of all of them.

link

danvoell 3700 days ago

"It would be interesting to compare them across several different data sets to get a better feel for strengths/weaknesses, etc." I agree. I think that would be the logical next step for this article. Show various real world examples and describe why certain clusters might be better for these types of problems. But all in all, awesome article, thanks for the education.

link