Hacker News new | ask | show | jobs
by fnord123 3616 days ago
Do you have any interest in implementing other clustering algorithms on GPU? e.g. HDBSCAN? Or is it not as parallelizable?
2 comments

Agree on HDBSCAN/DBSCAN, which is able to find the number of clusters in a large class of problems (unlike K-means, which requires that the number of clusters/centroids be provided as a hyperparameter, or found via some kind of search).

Otherwise, I just want to say to vmarkovtsev: thank you for this -- I will add it to my arsenal of tools, and may others will surely do so as well.

Thanks. Actually, I like DBSCAN a lot and use it often, though I am not much familiar with it's internals. It looks like it is iterative and thus does not fit very well to a GPU. The only way I see is to pick several seed points at start...
A Google search reveals this paper: https://arxiv.org/abs/1506.02226

This paper claims a "97x improvement" over traditional (non-parallelized) DBSCAN algorithms, but that's not a very helpful claim, because it does not indicate what the computational costs are as a function of, say, the number of data points or dimensions.

97x improvement is actually very suspicious. Thanks for the article!