Hacker News new | ask | show | jobs
by yeldarb 1175 days ago
Is there a good heuristic for picking a reasonable number of clusters automatically for an arbitrary set of vectors?
3 comments

I usually go with Davies-Bouldin index but there are a few methods:

Python/Sklearn: https://scikit-learn.org/stable/modules/clustering.html#clus...

R: https://cran.r-hub.io/web/packages/clusterCrit/clusterCrit.p...

You could always try to use a density based clusterer like DBSCAN, HDBSCAN or OPTICS to determine a likely number of clusters.
The elbow method is pretty common! https://en.wikipedia.org/wiki/Elbow_method_(clustering)

You can also use some regularization criterion (AIC, BIC, or other)