Hacker News new | ask | show | jobs
by tel 2867 days ago
> the representation you use to shard data should be homeomorphic to the intrinsic topology of the data mode

Why is that?

2 comments

Some valid relationships in the data model may not be representable in the sharding scheme. As a simple example, this is why many projection-based sharding schemes do not allow geometries in the polar regions. For any sufficiently rich analytical data model, you will eventually run into data or derived relationships that can't be properly represented in the system.

On a more practical level, non-homeomorphic representations also create a large number of additional edge cases that need to be handled to ensure correctness, many of which are obscure and non-obvious. In most implementations (including open source), developers tend to ignore many of these defects because they rarely affect simple mapping applications -- the reason they used a projection based representation in the first place is because it was easy. For complex, massive-scale geospatial analytics, customers have an uncanny ability to find these edge cases almost immediately.

I spent some time doing adapting some geometry algorithms to work with geospatial coordinates a few years ago. The problem with the poles is that latitude converges to 90 degrees whereas longitude degrees are all over the place if you move even slightly. This combined with precision issues with floating point math causes all sorts of issues.

A good example is a simple algorithm I did to draw circles on a map by turning them into polygons: https://github.com/jillesvangurp/geogeometry/blob/master/src...

This works perfectly fine if you stay away from the poles but if you get close enough the circles become a bit irregular. The algorithm tries to work around some of the issues but the results don't look pretty.

Other issues I encountered were several datasources with invalid degrees due to rounding errors. This is an issue along the dateline (180 degrees longitude). E.g. 180.0000001 degrees is invalid.

Another fun edgecase in geo is null Island, a fictional island of the coast of Africa at (0,0) that has become a fun little easter egg in many datasources. A friend of mine dedicated this website to it: https://www.vicchi.org/2014/04/05/welcome-to-the-republic-of...

Because of time bombs like "It works well enough because nobody really lives at the poles--"

That's exactly the sort of thing that works well until one day it just breaks your code because you forgot to make sure never to to do a computation on a thing that includes on of the poles. And it won't happen because someone starts living there rather it will happen because of a complicated logical chain of reasons that make perfect sense, but only in hindsight.