Hacker News new | ask | show | jobs
by bjornbsm 887 days ago
>I understood the GP's comment to refer to the state of the second grad student. How useful is K-means clustering without knowing K in advance?

There are several heuristics for this. Googling I see that the elbow method, average sillhouette method and gap statistic method is the most used.

I think you could play around with your own heuristics as well. Simple KDE plots showing the amount of peaks. Maybe, say the variance between clusters should be greater than the variance inside any cluster could maybe work. (Edit: this seems to be the main point of the average sillhouette method).