|
|
|
|
|
by Spivak
240 days ago
|
|
The total number of clusters. Determining this algorithmically is a fun open problem https://en.wikipedia.org/wiki/Determining_the_number_of_clus.... For the data I work with at $dayjob I've found the Silhouette algorithm to perform best but I assume it will be extremely field specific. Clustering your data and taking a representative sample of each cluster is such a powerful trick to make big data small but finding an appropriate K is an art more than a science. |
|