Hacker News new | ask | show | jobs
by jandrewrogers 418 days ago
> it's a lot of training to get people to up to speed on coordinate systems, projections, transformations, etc

This can mostly be avoided entirely with a proper spheroidal reference system, computational geometry implementation, and indexing. Most uses of geospatial analytics are not cartographic in nature. The map is at best a presentation layer, it is not the data model, and some don’t use a map at all. Forcing people to learn obscure and esoteric cartographic systems to ask simple intuitive questions about geospatial relationships is a big part of the problem. There is no reason this needs to be part of the learning curve.

I’ve run experiments on unsophisticated users a few times with respect to this. If you give them a fully spheroidal WGS84 implementation for geospatial analytics, it mostly “just works” for them anywhere on the globe and without regard for geospatial extent. Yes, the software implementation is much less trivial but it is qualitatively superior UX because “the world” kind of behaves how people intuit it should without having to know anything about projections, transforms, etc. And to be honest, even if you do know about projections and transforms, the results are still often less than optimal.

The only issue that comes up is that a lot of cartographic visualization toolkits are somewhat broken if you have global data models or a lot of complex geometry. Lots of rendering artifacts. Something else to work on I guess.

2 comments

Im inclined to agree, but unfortunately a huge amount of the existing data and processes in this space does not assume a spheroidal earth and come provided with a coordinate reference system. Ultimately there are also some domains where you got data that you explicitly don't want to interpret using spheroidal semantics, e.g. when working with a city plan - in which case the map _is_ the data model, and you definitely want the angles of a triangle to sum up to 180.
Similarly I see it as it an inevitable that you will deal with a problem related to these issues, then have a massive blocker if you aren't already familiar with the details because debugging requires more than a cursory understanding for hard problems. In the alternative, you don't know the problems exist and the code is broken. I don't think it's as simple as abstracting away the entire concept, which is what I would say is too high level like I mentioned above. I don't know the right answer here honestly, I think it will be disruptive when someone figures it out
> the software implementation is much less trivial

Aren't most geospatial tools just doing simple geometry? And therefore need to work on some sort of projection?

If you can do the math on the spheroidal model, ok you get better results and its easier to intuit like you said, but it's much more complicated math. Can you actually do that today with tools like QGIS and GDAL?

Many do use simple geometry. This causes endless headaches for people who are not cartographers, they don’t expect that. The good geospatial tools usually support spheroidal models but it is not the default, you have to know to explicitly make sure it uses that (many people assume that is the default).

An additional issue is that the spheroidal implementations have undergone very little optimization, perhaps because they are not the defaults. So when people figure out how to turn them on, performance is suddenly terrible. Now you have people that believe spheroidal implementations are terribly slow, when in reality they just used a pathologically slow implementation. Really good performance-engineered spheroidal implementations are much faster than people assume based on the performance of open source implementations.

For what it's worth, you _can't_ use spherical approaches for most data. They're only used for points, in practice. Your spatial data is inherently stored/generated in ways that don't allow spherical approaches as soon as you start working with polygons, let alone things like rasters.

Yes, spherical representations of polygon data exist, but the data you import has already been "split" and undoing that is often impossible, or at best non-trivial. And then rasters are fundamentally impossible to represent that way.

Analysis uses projections for that reason. Spherical approaches aren't fundamentally "better" for most use cases. They're only strictly better if everything you're working with is a point.

There's more to geospatial than point datasets.

A good point. Certainly for raster analysis it doesn't make sense.

But any type of vector data could be modeled on a sphere, right? Points, shapes, lines. And I saw "better" because even the best suited projection will have some small amount of distortion.

Either way, most things use planer geometry so projections are necessary, and you need to have some understanding of how all that works

You can model polygons on a sphere, but the issue is that the data you're starting with is already in a cartesian representation. You actually can't easily convert between the two for complex geometries in cases where they cross the antimeridian/poles. So trying to do anything other than points is difficult in practice, unless you're natively generating data from scratch in a spherical representation, which is rate.
This is not really a problem, unless you’re trying to simulate some 3D space orbits, physics. The crossover from geo INFORMATION systems to geo simulation systems is a bit rough, but the projections and calculations on projected cartesian space are enough for many typical questions, like distance, area, routing. However, even topology support starts getting specialized, and the use cases are more niche. I think it’s asking a bit too much from a database/storage layer to do efficient calculations outside of those supported by GEOS. At this point, you might want to import the relevant data into higher level applications.
Speaking for myself, I was not referring to any kind of simulation systems. This is a standard requirement of many operational geospatial data models, and there are a lot of these in industry. Anything that works from a projection is a non-starter, this causes demonstrable issues for geospatial analysis at scale or if any kind of precision is required. Efficient calculation just means efficient code, there is nothing preventing this from existing in open source beyond people writing it. Yes, you may be able to get away with it if your data model is small, both geographically and data size, but that does not describe every company.

It is entirely possible to do this in databases. That is how it is actually done. The limitations of GEOS are not the limitations of software, it is not a particularly sophisticated implementation (even PostGIS doesn’t use it for the important parts last I checked). To some extent you are affirming that there is a lack of ambition in this part of the market in open source.

I wouldn't say it's correct to say that GEOS isn't particularly sophisticated. A lot of (certainly not all) GEOS algorithms are now ported from JTS, the primary author of which is Martin Davis (aka Dr JTS), who works at Crunchy Data, who provide the PostGIS extension. So the chain (again, mostly) goes JTS -> GEOS -> {PostGIS, Shapely} -> … . Martin's work is at the cutting edge of open-source GIS-focused computational geometry, and has been for a long time (of course, industry has its own tools, but that's not what we're talking about).

I can sort of see your point about the merits of global, spheroidal geometry, certainly from a user's perspective. But there's no getting around the fact that the geometry calculations are both slower (I'm tempted to say "inherently"…) and far more complex to implement (just look at how painful it is to write a performant, accurate r- or r*-tree for spherical coordinates) along every dimension. That's not going to change any time soon, so the projection workflow probably isn't going anywhere.

You're right in that pretty much anything can be done via an API exposed via a database function. However, as they say... if it can be done, does mean it should? Now, I agree that having more sophisticated multi-dim calculations would be cool, but I've just rarely ran into needing or even wanting to do this, over many projects, some involving accurate simulations. In practice, database has always been for storing and querying data, which can be extremely accurate. I am probably the first person to abuse SQL and I've written some 3D ECEF rotation code in SQL, but it was a hack for a deadline, not because it was the right thing to do. All the projects I've worked with, had external models or components that did the "precise work" using complex code that I would never dare to make dependent on any database.

I'm actually curious, speaking for yourself, what kind of analysis you're doing where something like NAD83, or UTM does not give you enough precision? Is this actually "real world" geospatial data? If I have a soil model, I have a very localized analysis, and if I have a global climate model, we're talking kilometers for grid cells. In all these cases, the collected data has built in geolocation error MUCH grater than most decent projections...

So, what analysis are you doing where you need centimeter precision at global scale of thousands of kilometers? Sounds really interesting. The only time I've seen this, is doing space flight simulations where the error really accumulates into the future.