|
|
|
|
|
by jamesrom
2894 days ago
|
|
Using a Hilbert Curve for indexing is a pretty standard approach for geospatial queries. Nothing new here... However, using a Hilbert Curve for sharding doesn't seem like the best approach. You can partition by anything you like, it doesn't have to be arbitrary points along your index. Using 1-dimension to shard 2D data isn't optimal. For example, construct a heatmap of your 'load score' and shard based on that, in two dimensions. Then use an S2 curve to index inside that shard. |
|
Yes, that's also what I thought. Searching for "same size k-means" yields a simple postprocessing step to even out the clusters produced by the usual k-means algorithm.
EDIT: k-means is adapted directly here: https://elki-project.github.io/tutorial/same-size_k_means