Hacker News new | ask | show | jobs
by Mr_P 2531 days ago
I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).
5 comments

It is complicated, and it depends on the use case. There are roughly three dimensions to what you are optimizing the representation for: presentation, computational geometry, and decomposition (sharding). S2 and H3 are both fundamentally cartography-driven representations, primarily optimizing for presentation. S2 focuses a bit more on sharding and H3 a bit more on computational geometry, there is quite a bit of literature on the tradeoffs of their characteristic designs. If the core application is not presentation driven, such as pure spatiotemporal analytics, neither of these representations are good choices.

Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.

What is an example of a 3-space embedding or interesting literature? I'm having difficulties googling the term.
A 3-space embedding is a representation optimized for efficient decomposition and computational geometry, ideal for scale-out analytics. This is an interesting design problem in that you can't achieve both with a single surface and they are mathematically incompatible (one requires a discrete surface, the other requires a real surface). A 3-space embedding is a dual surface representation engineered to make it easy to move between the surfaces as required by code. As the name implies, you are logically embedding a standard 2-spheroid in a synthetic discrete 3-space and both coordinate systems can be used simultaneously. Presentation requires computing a projection of some sort.

Unlike single-surface representations, these have the advantage of being essentially free of computational edge cases if you design them correctly. They are also amenable to implementations that are extremely computationally efficient to use, which is a bit of an afterthought for most presentation-optimized designs but important for high-scale geospatial analytics.

A common reflexive criticism of these representations is that they use equal volume sharding, which means that sharding them is not a good approximation of equal area on the embedded surface. An equal area decomposition only makes sense in the context of presentation (e.g. tiling) because the underlying data distribution is naturally extremely and unpredictably skewed, leading to non-uniform cell loading no matter how you decompose it. The assumption that equal area decomposition helps to ensure uniform cell loading is trivially false in practice, making it a non-optimization. Therefore, any competent implementation always requires a separate mechanism for ensuring uniform loading independent of the decomposition model.

The term of art for all of this is discrete global grid systems (commonly "DGGS"). The vast majority of the literature is focused on presentation optimized systems, and the design of single-surface representations, but other types of representations are discussed. It has a very rich taxonomy. I have an article I've been sporadically writing which I should probably finish that steps through the design of a state-of-the-art 3-space embedding representation system for scale-out analytics, based on a (currently stalled) effort to produce a formal standard for industry. A good 3-space embedding has a relatively simple description and implementation but there is much technical subtlety as to why it is designed a specific way.

I'm guessing he means stuff like voronoi tesselation, which isn't limited to 3-space. Look at the books of Hanan Samet for more on this stuff: http://www.cs.umd.edu/~hjs/
H3 and S2 are kinda similar, except by using hexagons, H3 grid centroids are equidistant - rectangles have different distances from center to center.

H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases

The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.

For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

I work for Uber and have used H3, but don’t work on the library.

> For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

Ah, yes this makes a lot of sense. The video in the link actually does have a nice comparison at 17:30, where this is called out. It seems to me to be the most compelling argument for hexagons.

The critical advantage of h3 vs. s2 is that all neighbors are equidistant from the central cell. Also, the implementation of the projection means there is less distortion than s2.

I use this library every day and absolutely love it.

This is the same reason astronomers like HEALPix tiling for skymaps [0]. Equal areas-per-pixel (important for integrating over areas) with straightforward spherical harmonics calculations and hierarchical extensions that can store images at multiple resolutions by subdividing base cells (LIGO uses one such hierarchical strategy for our low-latency gravitational wave source direction estimates [1]).

[0] https://healpix.sourceforge.io

[1] https://arxiv.org/abs/1508.03634

Thanks for sharing. What do you use it for? Did you use S2 previously?
>it supports O(1) lookup vs. needing a hierarchical search

Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)

Seeing as Uber used to use S2, presumably it was worse for their needs.