| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fnord123 3616 days ago
	Do you have any interest in implementing other clustering algorithms on GPU? e.g. HDBSCAN? Or is it not as parallelizable?

2 comments

cs702 3616 days ago

Agree on HDBSCAN/DBSCAN, which is able to find the number of clusters in a large class of problems (unlike K-means, which requires that the number of clusters/centroids be provided as a hyperparameter, or found via some kind of search).

Otherwise, I just want to say to vmarkovtsev: thank you for this -- I will add it to my arsenal of tools, and may others will surely do so as well.

link

vmarkovtsev 3616 days ago

Thanks. Actually, I like DBSCAN a lot and use it often, though I am not much familiar with it's internals. It looks like it is iterative and thus does not fit very well to a GPU. The only way I see is to pick several seed points at start...

link

cs702 3615 days ago

A Google search reveals this paper: https://arxiv.org/abs/1506.02226

This paper claims a "97x improvement" over traditional (non-parallelized) DBSCAN algorithms, but that's not a very helpful claim, because it does not indicate what the computational costs are as a function of, say, the number of data points or dimensions.

link

vmarkovtsev 3615 days ago

97x improvement is actually very suspicious. Thanks for the article!

link

fnl 3615 days ago

DBScan certainly is. But not sure about a FOSS implementation...

http://www.sciencedirect.com/science/article/pii/S1877050913...

https://www.researchgate.net/publication/221614133_Density-b...

link