| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Spivak 240 days ago
	The total number of clusters. Determining this algorithmically is a fun open problem https://en.wikipedia.org/wiki/Determining_the_number_of_clus.... For the data I work with at $dayjob I've found the Silhouette algorithm to perform best but I assume it will be extremely field specific. Clustering your data and taking a representative sample of each cluster is such a powerful trick to make big data small but finding an appropriate K is an art more than a science.