Hacker News new | ask | show | jobs
by digsmahler 2858 days ago
> Instead of projecting the Earth directly onto a single square, it projects the Earth onto the 6 faces of a cube enclosing the Earth and then applies an extra non-linear transformations to reduce even more the deformations. Each cell in s2 is in fact part of one of six quadtrees that describe the whole planet.

That's a super cool detail! I once implemented a 2D index using a Z-Order curve that directly translated lat/lon coordinates to a linear ordering. It works well enough because nobody really lives at the poles--the search regions with a single projection get really obtuse there. Projecting the earth onto a 6-sided die is a really elegant solution to that problem! Go go Google engineering!

1 comments

There is a subtle concept at work here that many people don't know but which manifests in geospatial data models: the representation you use to shard data should be homeomorphic to the intrinsic topology of the data model. Using cartographic projections is popular but does not meet this criteria, and it does eventually break for non-trivial geospatial data models. A cube, on the other hand, is homeomorphic to a spheroid (like a donut is to a coffee cup) and therefore capable of practically representing much more complex data models.

That said, using a cube projection has its own set of limitations and issues for advanced geospatial analytics even though it is well-behaved for sharding. Current best practice representations embed a spheroid in a synthetic 3-space and shard the 3-space, which has few edge cases to worry about and is very efficient in time and space.

> the representation you use to shard data should be homeomorphic to the intrinsic topology of the data mode

Why is that?

Some valid relationships in the data model may not be representable in the sharding scheme. As a simple example, this is why many projection-based sharding schemes do not allow geometries in the polar regions. For any sufficiently rich analytical data model, you will eventually run into data or derived relationships that can't be properly represented in the system.

On a more practical level, non-homeomorphic representations also create a large number of additional edge cases that need to be handled to ensure correctness, many of which are obscure and non-obvious. In most implementations (including open source), developers tend to ignore many of these defects because they rarely affect simple mapping applications -- the reason they used a projection based representation in the first place is because it was easy. For complex, massive-scale geospatial analytics, customers have an uncanny ability to find these edge cases almost immediately.

I spent some time doing adapting some geometry algorithms to work with geospatial coordinates a few years ago. The problem with the poles is that latitude converges to 90 degrees whereas longitude degrees are all over the place if you move even slightly. This combined with precision issues with floating point math causes all sorts of issues.

A good example is a simple algorithm I did to draw circles on a map by turning them into polygons: https://github.com/jillesvangurp/geogeometry/blob/master/src...

This works perfectly fine if you stay away from the poles but if you get close enough the circles become a bit irregular. The algorithm tries to work around some of the issues but the results don't look pretty.

Other issues I encountered were several datasources with invalid degrees due to rounding errors. This is an issue along the dateline (180 degrees longitude). E.g. 180.0000001 degrees is invalid.

Another fun edgecase in geo is null Island, a fictional island of the coast of Africa at (0,0) that has become a fun little easter egg in many datasources. A friend of mine dedicated this website to it: https://www.vicchi.org/2014/04/05/welcome-to-the-republic-of...

Because of time bombs like "It works well enough because nobody really lives at the poles--"

That's exactly the sort of thing that works well until one day it just breaks your code because you forgot to make sure never to to do a computation on a thing that includes on of the poles. And it won't happen because someone starts living there rather it will happen because of a complicated logical chain of reasons that make perfect sense, but only in hindsight.