|
|
|
|
|
by elashri
1175 days ago
|
|
Hi Mo, thanks for this work. It seems interesting. I had the chance to play a little bit and wanted to compare that with KMeans. I relied on sklearn KMeans implementation. Furthermore, I did some examples (mostly what is available). But One interesting thing I did is I generated some isotropic Gaussian blobs for clustering (using `make_blobs`) and then tried a comparison between the two methods. Bandit PAM was a little bit better for a couple of metrics I used, but also much faster. I was generating `n_samples=1000` but then I increased it to `n_samples=10000` and I found that it is much slower than KMeans, see [1] and code is in [2]. Is there a particular reason for that? [1] https://imgur.com/a/VibpgNz [2] https://paste.elashri.xyz/aXCE |
|
I suspect that this is because the scikit-learn implementation of KMeans subsamples the data and uses some highly-optimized data structures for larger datasets. I've asked the team to see how we can use some of those techniques in BanditPAM and will update the Github repo as we learn more and improve our implementation.