Hacker News new | ask | show | jobs
by flippmoke 2362 days ago
Author of Mapbox's Vector Tile specification here and also contributor to some of the code that is used by PostGIS and I wanted to add some additional clarity on some topics associated with Vector Tiles and dynamic serving of them that seems to be a new trend.

The Vector Tiles specification was designed for map visualization but has expanded into other uses as well, but in general the purpose is to be able to quickly provide a complete subset of data for a specific area that is highly cacheable. Most of this provided speed and cache-ability is specifically gained by preprocessing all the data you will use in your map into tiles.

The general steps for turning raw data into Vector Tiles are:

1. Determine a hierarchy of your data. For example if you are talking about roads at some zoom levels you will want to see only highways or major roads while at other zoom levels you will want all your data.

2. For each tile at each zoom level; Select your data following your hierarchy rules, simplify your data based on your zoom level (for example you might need less points to display your road) and then clip your data to your tile and encode it to your Vector Tile.

The problem is that doing these steps is often very complex and requires thought about the cartography of your final resulting map, but it can also drastically effect performance. If you are dynamically serving tiles from PostGIS it is very hard to reduce large quantities of data quickly in some cases. For example take a very detailed coastline of a large lake that is very precise and you are wanting to serve this dynamically. If you are attempting to serve this data on demand each time you need a tile you have to simplify and clip a potentially massive polygon. While this might work for single requests, if you increase in scale this quickly adds lots of load to a PostGIS server. The only solution is to cache the resulting tiles for a longer period to limit load on your database or to preprocess all your data before serving.

Preprocessing of all the tiles is already something other tiling tools such as tippecanoe are really good at doing and comes with the benefit of helping you determine a hierarchy for your data. Preprocessing might seem excessive when it comes to making potentially millions of tiles, but in general it makes your application faster because it is simply serving an already created tile.

Therefore, if your data does not very change quickly I would almost always suggest using preprocessing over dynamic rendering of tiles. You might spend more effort maintaining something than you expect if you start using PostGIS to create tiles on demand over existing tiling tools.

2 comments

Very good comment and thanks for your work on MVT. I use PostGIS's MVT tools on a daily basis.

I do an intermediate approach: my queries are sometimes too expensive to run dynamically, and my data change semi-frequently (daily/weekly basis), but when they do change I have a clear idea of what tiles are affected. So any time my data needs updating I can mark tiles as stale and then I have a sidekiq job that processes them and uploads them to S3. The tile server itself pulls from S3.

This is probably not quite as fast as a dedicated tile server, but it's far more reliable/responsive than dynamic rendering and reduces load spikes on the database.

So I saw this post earlier to day and tried it on a dataset we have (fixed boundaries w/ some properties that change 4x/hr). We use the value of the properties for styling of the vector tiles. Currently the tiles are re-rendered every 4hrs (even though the data is updated every 15 min) using tippiecanoe, served by tileserver-gl and cached in cloudfront. So I wanted a way to get new data to users faster. But as you have noted this dynamic process crunchy posted IS SLOW, it takes about 3 minutes to paint the world on my brand new macbook pro (about 3 seconds w/ pre-rendered). Given the country boundaries do not change very often is there a way to change just the properties that actually needed updated in the already rendered vector tiles? Our pipeline takes about 45 min to run completely to regenerate the new tiles with updated properties. Or is there a better way to present this data? We started out w/ GeoJSON directly from the DB but the size of the files were huge, the vector tiles are 30% the size of GeoJSON. We were in the MTS private beta but they didn't have the 'update' process worked out yet so it was a full refresh each time.
We will be releasing incremental updates to Tilesets API Beta (MTS) here shortly, reach out to us again and we can talk about having you test it out!
I haven't done this but I imagine you could put a service worker that has a fetch event listener that puts you in front of the raw tile data being cached. https://github.com/mapbox/mapbox-gl-js/issues/4326

From their you can serialize/deserialize the whole tile and map a new field (annoying), or if your clever... map your variable value fields lower in the values index array of the vt pbf. That way assuming you have a small number of unique style by values, you could get away with simply replacing a single byte representing that style value field with another value dictating a different style, for each feature in the vector tile.

That might be a little to abstract so tl:dr version put a listener in front of fetch. One byte represents the target dynamic field in each feature in the tile (if you have a small number of unique values). Replace that single byte with your desired target byte.