| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lmcinnes 3702 days ago
	I agree that on some level more data sets would be nice, but I felt that it cluttered and obscured the exposition. Instead I used the one synthetic dataset, but crafted in to have various properties (noise, cluster shape, variable density, non-standard distributions) that will confound many different clustering approaches ... it is meant to be the "hard" case that with all the difficulties and confounding factors rolled into one dataset.

1 comments

mattnedrich 3702 days ago

Cool, I think you did a great job. Do you have run time data for each algorithm on that data set?

link

lmcinnes 3702 days ago

It's included in the upper left corner of the plots. To be fair, these are for the sklearn implementations, some of which are excellent, but I can't speak for the performance of all of them.

link