|
|
|
|
|
by westurner
83 days ago
|
|
ScholarlyArticle: "Flash-KMeans: Fast and Memory-Efficient Exact K-Means" (2026) https://arxiv.org/abs/2603.09229 .. gscholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=Fla... : > Abstract: [...] Flash-kmeans introduces two core kernel-level innovations: (1) FlashAssign, which fuses distance computation with an online argmin to completely bypass intermediate memory materialization; > (2) sort-inverse update, which explicitly constructs an inverse mapping to transform high-contention atomic scatters into high-bandwidth, segment-level localized reductions. > Furthermore, we integrate algorithm-system co-designs, including chunked-stream overlap and cache-aware compile heuristics, to ensure practical deployability. > [...] flash-kmeans achieves up to 17.9X end-to-end speedup over best baselines, while outperforming industry-standard libraries like cuML and FAISS by 33X and over 200X, respectively. k-means clustering > Algorithms > Variations:
https://en.wikipedia.org/wiki/K-means_clustering#Variations |
|