Hacker News new | ask | show | jobs
by jandrewrogers 2532 days ago
It is complicated, and it depends on the use case. There are roughly three dimensions to what you are optimizing the representation for: presentation, computational geometry, and decomposition (sharding). S2 and H3 are both fundamentally cartography-driven representations, primarily optimizing for presentation. S2 focuses a bit more on sharding and H3 a bit more on computational geometry, there is quite a bit of literature on the tradeoffs of their characteristic designs. If the core application is not presentation driven, such as pure spatiotemporal analytics, neither of these representations are good choices.

Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.

1 comments

What is an example of a 3-space embedding or interesting literature? I'm having difficulties googling the term.
A 3-space embedding is a representation optimized for efficient decomposition and computational geometry, ideal for scale-out analytics. This is an interesting design problem in that you can't achieve both with a single surface and they are mathematically incompatible (one requires a discrete surface, the other requires a real surface). A 3-space embedding is a dual surface representation engineered to make it easy to move between the surfaces as required by code. As the name implies, you are logically embedding a standard 2-spheroid in a synthetic discrete 3-space and both coordinate systems can be used simultaneously. Presentation requires computing a projection of some sort.

Unlike single-surface representations, these have the advantage of being essentially free of computational edge cases if you design them correctly. They are also amenable to implementations that are extremely computationally efficient to use, which is a bit of an afterthought for most presentation-optimized designs but important for high-scale geospatial analytics.

A common reflexive criticism of these representations is that they use equal volume sharding, which means that sharding them is not a good approximation of equal area on the embedded surface. An equal area decomposition only makes sense in the context of presentation (e.g. tiling) because the underlying data distribution is naturally extremely and unpredictably skewed, leading to non-uniform cell loading no matter how you decompose it. The assumption that equal area decomposition helps to ensure uniform cell loading is trivially false in practice, making it a non-optimization. Therefore, any competent implementation always requires a separate mechanism for ensuring uniform loading independent of the decomposition model.

The term of art for all of this is discrete global grid systems (commonly "DGGS"). The vast majority of the literature is focused on presentation optimized systems, and the design of single-surface representations, but other types of representations are discussed. It has a very rich taxonomy. I have an article I've been sporadically writing which I should probably finish that steps through the design of a state-of-the-art 3-space embedding representation system for scale-out analytics, based on a (currently stalled) effort to produce a formal standard for industry. A good 3-space embedding has a relatively simple description and implementation but there is much technical subtlety as to why it is designed a specific way.

I'm guessing he means stuff like voronoi tesselation, which isn't limited to 3-space. Look at the books of Hanan Samet for more on this stuff: http://www.cs.umd.edu/~hjs/