Agree on HDBSCAN/DBSCAN, which is able to find the number of clusters in a large class of problems (unlike K-means, which requires that the number of clusters/centroids be provided as a hyperparameter, or found via some kind of search).
Otherwise, I just want to say to vmarkovtsev: thank you for this -- I will add it to my arsenal of tools, and may others will surely do so as well.
Thanks. Actually, I like DBSCAN a lot and use it often, though I am not much familiar with it's internals. It looks like it is iterative and thus does not fit very well to a GPU. The only way I see is to pick several seed points at start...
This paper claims a "97x improvement" over traditional (non-parallelized) DBSCAN algorithms, but that's not a very helpful claim, because it does not indicate what the computational costs are as a function of, say, the number of data points or dimensions.
Otherwise, I just want to say to vmarkovtsev: thank you for this -- I will add it to my arsenal of tools, and may others will surely do so as well.