|
|
|
|
|
by vmarkovtsev
3614 days ago
|
|
1. The number of samples must not exceed UINT32_MAX, that is, 4*10^9. The number of clusters must not exceed UINT32_MAX too. Number of dimensions must not exceed 12288 (GPU shared memory constraint). We successfully tested with 4M samples and 480 dimensions (the product is greater than UINT32_MAX) against potential overflows. Practically speaking, it will not take ages if the product of samples, clusters and dimensions does not exceed 10^14. 2. No, unfortunately I don't. Implementing a tree on GPU is a real pain and the performance will still be bad, so I didn't even consider them. The common problem with advanced approaches is the memory overhead. Hi-end GPU has only 12 GB and you have to fit. E.g. Yinyang becomes unapplicable on 500000 samples, 10000 clusters and 480 dimensions (though only the product of samples and clusters matters much). |
|