|
|
|
|
|
by bagels
886 days ago
|
|
You have to choose the number of clusters, before using k-means. Imagine that you have a dataset, where you think there are likely meaningful clusters, but you don't know how many, especially where it's many-dimensioned. If you pick a k that is too small, you lump unrelated points together. If k is too large, your meaningful clusters will be fragmented/overfitted. There are some algorithms that try to estimate the number of clusters or try to find the k with the best fit to the data to make up for this. |
|