Hacker News new | ask | show | jobs
by fluffet 2170 days ago
Good article.

I used HDBSCAN in my master's thesis. It works well with high dimensional data. If you're using it for high dimensional stuff I would recommend working with Uniform Manifold Approximation and Projection (UMAP) to visualise. I think it is made by the same author as HDBSCAN.

I wish they also talked about Density-Based Clustering Validation (DBCV) which can be used to calculate the mathematical stability of the clusters (for hyper parameters), apart from just looking at hierarchies.

1 comments

This is sorta true, but not quite. Leland McInnes and John Healy (the creators of UMAP), do in-fact have an amazing paper on HDBSCAN, but it's not inventing it. In their paper, https://arxiv.org/pdf/1705.07321.pdf, they introduce AHDBSCAN which is a great extension of HDBSCAN to dramatically improve it's performance.

Their work is great but just wanted to save people a google in case they were interested.

Never heard of AHDBSCAN, thank you for sharing!