Nice! Now I wonder when 36 vs 8 machine instructions become a bottleneck. I have seen applications of space-filling curves in quasi Monte Carlo integration, it could be potentially significant there.
Hilbert curves are used in a lot of graphics too. Heck, the old SGI Octane with Vpro graphics used a recursive Hilbert curve rasterizer. They show up a lot today in geospatial big-data since hilbert addresses make good shard keys.
I suspect that most production applications of Hilbert curve ordering would work just as well with Z order (a.k.a. Morton order), with the additional benefit of being simpler to reason about (just interleave/de-interleave the bits).
I haven’t ever seen any convincing benchmarks or other analysis where the Hilbert curve created any notable performance advantage vs. Z order; the only time you really need it is if moving along the linearized coordinate must never have jumps in the multidimensional coordinates, but I’m not convinced there are many if any real-world cases where that is important (note that in either case small movements in the multidimensional coordinates are associated with large jumps in the linearized coordinate). If the only goal is to minimize memory fetches, etc. then the Z ordering works just fine.
(If you know any good comparisons where the Hilbert curve comes out ahead, I’d be curious to read them.)
https://github.com/leni536/fast_hilbert_curve
I only implemented the index->XY calculation yet. It compiles to 36 instructions without any branches and takes up 86 bytes.
https://github.com/leni536/fast_hilbert_curve/wiki/How-effic...
I think I can apply the same tricks for the inverse function too.