Hacker News new | ask | show | jobs
by Stratoscope 36 days ago
One task where GeoJSON falls down is simplification of a group of polygons with common boundaries, e.g. the 48 conterminous US states. If you start with a highly detailed set of polygons, you need to simplify them for practical display in an online map.

GeoJSON doesn't encode the fact that the boundary points are common between adjacent polygons. When you simplify those polygons, each one is handled separately, and you end up with "slivers" where the boundaries are misaligned:

https://www.bing.com/images/search?q=map+slivers+betwen+poly...

TopoJSON solves this by encoding each such boundary only once. So when you simplify the polygons, they are all done together, and the same simplification applies to adjacent polygons. No more slivers!

https://github.com/topojson/topojson

https://github.com/topojson/topojson-simplify

5 comments

GeoJSON is not TopoJSON. Saying that is "falling down" is like criticizing a zebra for not being a giraffe. GeoJSON is a mapping of the (non-topological) "simple features" model into JSON, full stop. It does that fine.
Yes, the same "slivers" problem occurs when you try to simplify features in any format that uses individual polygons, such as shapefiles or whatnot. That's the only case I was referring to.

I don't think I would trust a zebra or a giraffe for this task either.

Is this actually GeoJSON falling down, or decades of convention extended to JSON? Topology is great, but it is sidestepped by Shapefile/WKT/WKB/etc, in favor of independent primitives like POINT, LINE, POLYGON. If GeoJSON did not exist as a new JSON GIS data format encoding these primitives, TopoJSON would not have "replaced" it, due to the added mis-match with other non-topological formats.

From what I can tell, the top criticism of GeoJSON is the under-enforced winding order specification, and crossing the antemeridian.

Right. Encoding a union algorithm into the data structure just introduces the reverse problem: Selecting a subset now requires extra logic beyond jq.
Similarly, typical map APIs like the Google Maps API accept GeoJSON and not TopoJSON. I was not suggesting TopoJSON as a replacement for GeoJSON, but as a complement to it. With the tools on the TopoJSON GitHub, you can have GeoJSON input and output, but convert to TopoJSON for the simplification step to avoid the "slivers" problem.
How is that a geojson problem? If your dataset is correct, adjacent borders will just use the same points and will match exactly.
The problem is simplification. Suppose two regions share a border with some nonlinear points a, b, c, d. Simplifying the polygon for the first region might yield a, b, d while the second yield a, c, d. This creates gaps or overlaps between the two regions.
But what is the border? Set the border to what it actually is, not a simplification of it. The state of Colorado is formally a 697 sided polygon, don't simplify it to a rectangle.
This is not what OP is describing. It is very common to simplify objects for decreasing boundary objects by orders of magnitude. GeoJSON is missing correlation when you do that. Simplifying country objects from a GeoJSON source could lead to a gap between the country borders. So you either have poor representation or a longer pipeline to convert objects to an amenable object set. It also breaks idempotency in some regards.
To do the simplification, you detect shared borders, simplify and generate polygons again. That doesn’t make topojson inherently superior. You can convert back and forth and for many applications geojson is easier to process.
Yes, you could write code to do that. Or use the utilities provided in the TopoJSON GitHub and let them do it for you: convert to TopoJSON, simplify, convert back to GeoJSON. They have already written all the code for you.
It depends on what purpose you are using the polygons. In an online map you need to simplify way down. Consider these Colorado maps at two different zoom levels:

https://maps.app.goo.gl/JH93ko96QcoLXuBJ9

https://maps.app.goo.gl/au53iTnsmNdFuEZV8

Even the one zoomed in on the state appears to use maybe 15-20 vertices max.

In the second one, if I squint real hard I can just barely make out one slight dogleg on the western border and one on the south. And that is partly because I knew to look for them in the zoomed-in map.

If we use, say, the Census TIGER/Line boundary definitions for the states, we are probably talking about hundreds of thousands of vertices, perhaps millions. You won't be using those in an online map without simplifying.

The Texas border with Mexico is formally down the centerline of the Rio Grande, even as the river moves (ignoring fiddly complications). Even if you could somehow take a perfect snapshot of it at a given time, you'd run into the coastline paradox when sampling it.
So don’t simplify the shapes on their own. Geojson is a storage and exchange format, you can still convert it to other formats if you want to modify it.
I think what the original comment is pointing out is that GeoJSON lacks a concept of a shared boundary. Shared boundaries expressed in GeoJSON get around that by duplicating data. Whenever data is duplicated, there's a risk that the copies will not be exactly the same. That makes the task of modification more challenging given that the real world is full of messy data, like duplicates not matching.

20-25 years ago I worked a lot with map data from otherwise high quality, and sometimes authoritative, sources like the USGS and NOAA that had this non-identical shared boundaries problem (in formats other than GeoJSON). If the format doesn't allow such mistakes to be expressed, then they have to fix their data to publish it in said format.

Sure, but not every format is useful for everything. Geojson is great if you want a simple way to express a shape to show on a map. It’s like criticizing CSV because people put strings in choice value fields instead of doing a foreign key to another table. That’s just not what the format is used for.
I'd take your point further... No format is useful for everything. But we have to be aware of the trade-offs of each format (or language or tool or ...) in order to make the right choice of what to use for a given use case. We do that by sharing knowledge of where a given tool succeeds and where it falls down. Pointing out something a format doesn't handle well is not condemning that format for all use cases (I happily choose GeoJSON over other formats for many things).
Partly true, but don't blame GeoJSON. Blame the data model.

GeoJSON is firmly rooted in the "simple features" model of spatial data. Sometimes called the "vector data model", this is ubiquitous in GIS. Each geographic entity (aka "Feature") has a single geometry and many non-geometry attributes. Each feature is independent.

The vector data model (for better or worse) is the basis of many systems because it fits the tabular/relational style so closely. What is a feature but a row in a table plus a special column describing its geometry? Topological relationships are ignored by design.

TopoJSON, ESRI coverages, the internal OpenStreetMap data model, and a few others are notable exceptions. They explicitly handle spatial relationships, at the cost of making the model unintelligible to row-based processing.

I like TopoJSON and have used it in projects. But it's weird to set it up as opposition to GeoJSON. It's a complement. GeoJSON is a general data format meant to replace uses of ESRI Shapefiles and other complex formats. TopoJSON is more of a solution for a particular application need.

Is there much work developing or using TopoJSON these days? I haven't seen much about it in a few years.

To be clear, I'm not suggesting TopoJSON as an alternative to GeoJSON. I like GeoJSON and was loosely involved with the working group that created and updated its spec.

I'm just saying that for the specific task I mentioned GeoJSON or any format such as shapefiles that store polygons individually naturally leads to the "sliver" problem.

A nice processing pipeline is:

1. Convert GeoJSON to TopoJSON.

2. Run the simplification on the TopoJSON.

3. Convert the resulting TopoJSON back to GeoJSON.

The TopoJSON GitHub has tools for each of these steps.