| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dilippkumar 886 days ago

Not GP, but I understood their question as follows.

Assume you collect some kindergartners and top NBA players into a room and collect their heights. Now say you pass these to two hapless grad students and ask them to perform K-means clustering.

Suppose one of the grad students knew the composition of the people you measured and can guess these height should clump into 2 nice clusters. The other student who doesn't know the composition of the class - what should they guess K to be?

I understood the GP's comment to refer to the state of the second grad student. How useful is K-means clustering without knowing K in advance?

1 comments

bjornbsm 886 days ago

>I understood the GP's comment to refer to the state of the second grad student. How useful is K-means clustering without knowing K in advance?

There are several heuristics for this. Googling I see that the elbow method, average sillhouette method and gap statistic method is the most used.

I think you could play around with your own heuristics as well. Simple KDE plots showing the amount of peaks. Maybe, say the variance between clusters should be greater than the variance inside any cluster could maybe work. (Edit: this seems to be the main point of the average sillhouette method).