| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 2530 days ago

A 3-space embedding is a representation optimized for efficient decomposition and computational geometry, ideal for scale-out analytics. This is an interesting design problem in that you can't achieve both with a single surface and they are mathematically incompatible (one requires a discrete surface, the other requires a real surface). A 3-space embedding is a dual surface representation engineered to make it easy to move between the surfaces as required by code. As the name implies, you are logically embedding a standard 2-spheroid in a synthetic discrete 3-space and both coordinate systems can be used simultaneously. Presentation requires computing a projection of some sort.

Unlike single-surface representations, these have the advantage of being essentially free of computational edge cases if you design them correctly. They are also amenable to implementations that are extremely computationally efficient to use, which is a bit of an afterthought for most presentation-optimized designs but important for high-scale geospatial analytics.

A common reflexive criticism of these representations is that they use equal volume sharding, which means that sharding them is not a good approximation of equal area on the embedded surface. An equal area decomposition only makes sense in the context of presentation (e.g. tiling) because the underlying data distribution is naturally extremely and unpredictably skewed, leading to non-uniform cell loading no matter how you decompose it. The assumption that equal area decomposition helps to ensure uniform cell loading is trivially false in practice, making it a non-optimization. Therefore, any competent implementation always requires a separate mechanism for ensuring uniform loading independent of the decomposition model.

The term of art for all of this is discrete global grid systems (commonly "DGGS"). The vast majority of the literature is focused on presentation optimized systems, and the design of single-surface representations, but other types of representations are discussed. It has a very rich taxonomy. I have an article I've been sporadically writing which I should probably finish that steps through the design of a state-of-the-art 3-space embedding representation system for scale-out analytics, based on a (currently stalled) effort to produce a formal standard for industry. A good 3-space embedding has a relatively simple description and implementation but there is much technical subtlety as to why it is designed a specific way.