| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by macmccann 3085 days ago

For anyone else wondering, like I did, why you would use a hexagonal indexing system instead of a (roughly square) latitude longitude system, from https://eng.uber.com/elk/:

"Data size matters in Elasticsearch; a large index means too much data, caching churn, and poor response time. When query volumes are large, ELK nodes become overloaded, causing long garbage collection pauses or even system outages.

To address this, we switched to hexagonal queries, dividing our maps into hexagonal cells. Each hexagonal cell has a string ID determined by the hexagon resolution level. A geodistance query can be roughly translated to a ring of hexagon IDs; although a hexagonal ring is not a circular surface, it is close enough for our use case. Due to this adjustment, our system’s query capacity more than tripled."

6 comments

jandrewrogers 3085 days ago

Discrete Global Grid Systems (DGGS), which is what this touches on, are a deceptively complex topic. You are optimizing for many performance dimensions, and some things that intuitively seem like they should be optimized for don't really matter, and other things people tend to ignore (like congruency) matter quite a bit.

For example, "equal area" subdivision of the surface is not a particularly useful property. This seems like it is wrong on its face such that most people try to achieve it but you have to remember that equal area only matters if your data is uniformly and predictably distributed. Geospatial data models are neither in a pretty severe way as a rule, which means you'll have to deal with data load asymmetry regardless via some other mechanism. If you can ignore equal area because it is handled by some other mechanism, it opens up other possible surface decompositions that have much stronger properties in other ways.

Compactness of representation and cost of computing relationships on the representation has a large impact on real-world index performance that is frequently ignored. The poorness of lat/lon grids in this regard is often significantly underestimated. A singularity-free 3D DGGS can have an indexing structure that is an order of magnitude smaller than a basic 2D lat/long grid for addressing, and the storage on disk of records encoded in these DGGS are also significantly smaller. This all adds up to a lot of performance.

Hexagonal grids tend to work particularly well for visualization. However, they do have their own significant weaknesses e.g. it is typically not a good representation for join operations and they are relatively expensive to search at large scales relative to some other DGGS tessellation systems.

espeed 3085 days ago

Hi Andrew - You went to school for chemistry and chemical engineering. I'm still experimenting with ideas for binary codes of succinct/implicit geometric representations for similarity graph embeddings of non-spatial property data like text and numbers. I keep bumping into the crystallography literature -- the traditional lattice structures and also quasicrystal / fibonacci chain representations. How much of your chemical engineering background have influenced your designs?

jandrewrogers 3085 days ago

Almost no influence as related to spatial structure; I tend to view spatial structure as an information theoretic manifestation.

On the other hand, chemical engineering had a big influence on how I reason about distributed systems. That discipline is essentially about the design of complex, continuous flow, coordination-free distributed computation systems that are robustly stable in an efficient equilibrium. It maps directly to computer science but has a concept of the problem space that I think is much more refined than what you commonly see in computer science though it is never expressed in computer science terms. But it makes sense, it is chemical engineering’s One Job.

javajosh 3085 days ago

>robustly stable in an efficient equilibrium

I would like to read more about what this means and how it applies to distributed systems. I have a physics background, but I sense that chemists tend to have a much more intricate (and interesting) conception of "stability".

refset 3084 days ago

I found it interesting to discover that there's a significant amount of cutting-edge HPC development along these lines for meteorology and climate modeling: https://www.hpcwire.com/2017/06/22/gpus-power9-figure-promin...

dvanduzer 3085 days ago

For those interested in the variety of uses for hexagonal / hierarchical indexing, Dr. Kevin Sahr at Southern Oregon University has produced an open specification and a number of ancillary research around the topic: http://www.discreteglobalgrids.org/information/

There are several options for partitioning your grid, and the geometric consequences on your index. This overview in particular should be accessible even without a deep GIS background: http://webpages.sou.edu/~sahrk/sqspc/pubs/xuzhouKeynote13.pd...

elihu 3085 days ago

That's interesting. It looks like the ideas are similar to a Haskell library I wrote awhile back to tile a sphere with hexagons (and a few pentagons):

https://wiki.haskell.org/IcoGrid

My tiling had a fixed resolution that you specify up-front; I didn't really consider how to make my grid hierarchical.

scdoshi 3085 days ago

S2, linked from the README[1], does the same hierarchical subdivisions in squares, with each square having a string ID.

Squares divide into sub-squares exactly, but hexagon's don't sub-divide into hexagon's neatly.[2]

Can someone explain the advantages of hexagons over squares in this use case?

[1]https://code.google.com/archive/p/s2-geometry-library/ [2]https://www.illustrativemathematics.org/content-standards/ta...

jacobolus 3085 days ago

Gosper island shapes do subdivide into gosper island shapes, if you want something tricky: https://en.wikipedia.org/wiki/Gosper_curve#Properties

dvanduzer 3085 days ago

See sibling post for links, but the short answer: you can use 12 pentagons, strategically interspersed. Like a (soccer) football.

edit: Oops, that only explains the tiling. For explanations of how the hierarchy descends, Dr. Sahr produced some GIFs that give a little bit of an overview of how the resolutions overlay: http://webpages.sou.edu/~sahrk/dgg/images/topogif/topogif.ht...

scdoshi 3085 days ago

Thanks for that link.

Also found this which goes into some more detailed comparison with rectangular grids:

https://github.com/uber/h3/blob/master/docs/doxyfiles/usecas...

malandrew 3085 days ago

Boundaries between all 6 neighbors are equal. With S2, four are on the sides and top are equal and the corner neighbors have only a single point as their boundary.

This means you can better simulate dynamic systems where there is flow between cells.

adrianratnapala 3085 days ago

Of course a squareish grid can also be used to approximate a disc or ring. The approximation is not as pretty, but I don't see how it makes much difference for a query system.

jacobolus 3085 days ago

A triangular grid (or if you like, a grid of hexagonal pixels) is more efficient and closer to isotropic than a square grid. Circles pack more closely in a hexagonal arrangement than a square one.

Guidii 3085 days ago

So there are a few reasons to favour hexagons (or triangles) over squares....

Squares don't pack as closely as hexes, meaning that you can get 18% more hexes into a space than squares for a given perimeter-size. Squares also degenerate badly over the surface of the sphere.... you can't keep a consistent size as you change latitude, and they turn into triangles when you reach the poles.

timmaxw 3084 days ago

Google S2 (a square-based alternative to Uber's H3) works by projecting the surface of the sphere onto the surface of a cube, then subdividing each face of the cube. So there's still some distortion around the edges and corners of the cube, but nowhere near as bad as a naive longitude/latitude strategy.

b0rsuk 3085 days ago

I thought it was to make best use of all those hexagonal monitors ?