Hacker News new | ask | show | jobs
by aguynamedben 697 days ago
An interesting variant of space-filling curves + dimensionality reduction is Geohash (https://en.wikipedia.org/wiki/Geohash, http://geohash.org/) which takes a lon/lat and uses a Z-curve approach to produce a hash such as `u4pruydqqvj` representing the location. This hash value is basically "how far along the space-filling curve is the lon/lat located". You're reducing two dimensions (lon/lat) into one dimension (how far along the space-filling curve).

There's a unique side-effect to Geohashes in that the value (`u4pruydqqvj`) can have it's end "lopped off" (i.e. cut down to `u4pru`) and it still represents a less precise, but generally accurate representation of the original lon/lat in most cases (when the curve isn't near the edge of the 2d map!). This allows you to index locations (lat/lon) using a string ('u4pru') which opens you up to doing b-tree, range queries, etc. in traditional database, with one field.

Just a rad math quirk! I'm not an expert, and it's a very dense book, but if someone really wants to get into this kind of stuff the "Bible" is "Foundations of Multidimensional and Metric Data Structures" by Hanan Samet.

5 comments

They're also useful for representing IPv4 space. Tom7 had a nice video where he does this after pinging the whole (IPv4) internet [0]

[0] https://www.youtube.com/embed/JcJSW7Rprio at about 6 minutes, though I recommend the whole thing.

> [I]t's a very dense book, but if someone really wants to get into this kind of stuff the "Bible" is "Foundations of Multidimensional and Metric Data Structures" by Hanan Samet.

You're not Kidding! 1022 pages, with a TOC that nests 4 layers deep. In particular, section 2.1.1.2 "Ordering Space" covers space-filling curves, and true to your description it covers every point brought up in the comments so far:

- Peano-Hilbert curve of OP

- Z-order curve mentioned

- "Reverse locality" issues

- Performance considerations of the mapping function

- Even 11 exercises for this section alone!

What a beastly reference. Thank you for sharing.

The thing about SFC addressing is that two places near each other always share a prefix, and the closer they are, the longer the common prefix. It's sort of like Gray coding. Quadtree addressing, for example, also has the property that you mentioned (as do, of course, normal lat/long coordinates!) but two adjacent locations may not have similar addresses at all if they happen to straddle a subdivision boundary. (Again, compare to "normal" numbers where, say, |2000-1999| = 1 but there's no common prefix at all!)
The same happens, in fact, for space-filling curves. Sure, in _most_ of the area covered by the curve, closeness of points means closeness on the curve. But in this Hilbert curve:

https://upload.wikimedia.org/wikipedia/commons/7/7c/Hilbert-...

Consider the two points on the curve straddling the middle of the top edge of the square. They are very close together, but their 1D addresses on the curve are very far apart (one close-ish to the beginning, and one close-ish to the end)!

There ain't no free lunch.

Yes, you’re right, of course.
The boustrophedonic version of Rosenberg-Strong function is my favorite because it doesn't have any such jumps and has better locality preserving qualities than most alternatives.

See https://hbfs.wordpress.com/2018/08/07/moeud-deux/.

Protomaps (https://protomaps.com/) uses this for their PMTiles HTTP range request lookups too right (https://docs.protomaps.com/pmtiles/)?
Geohash just appears to be obfuscated zorder/morton, and being able to truncate the bits etc is just normal SFC stuff.