The entire article can be summed up as: “OSM stores maps as graphs, in flat files where each line is either a node, an ordered list of nodes, or metadata. The graph nodes can be arbitrarily ordered in OSM files, which leads to computational complexity when parsing them. This is not a bad thing, since it means that the spec for OSM files can be extremely simple, which makes it easy for people to contribute to OSM. Other mapping formats optimized for parsing speed require a lot of irrelevant fluff that makes them much harder to understand by human contributors.”
Ironically, 95% of this article is irrelevant fluff that does not make it any easier for the reader to understand.
> OSM stores maps as graphs, in flat files where each line is either a node, an ordered list of nodes, or metadata. The graph nodes can be arbitrarily ordered in OSM files, which leads to computational complexity when parsing them. This is not a bad thing, since it means that the spec for OSM files can be extremely simple, which makes it easy for people to contribute to OSM.
That's actually a sensible design. Treat user-facing stored data as user interface. If you need efficient processing of that data, such as fast parsing, you can always build it elsewhere, such as by caching that data into an intermediate structure that is recompiled whenever the user data changes.
Wait, the proposed solution to a data format being slow to parse is to work around the bad performance by caching the already parsed representation? That seems like it has a clear flaw if you’re only accessing the data once…
Accessing any given data once. When you have a total dataset size in the 10-100s of gigabytes range, having to download any significant fraction of it to do data processing is really unfortunate.
But seriously what's up with this total disdain for anyone trying to build applications with OSM data? You don't seem to care whether parsing is near instant or as other commenters have mentioned, literally a majority of total processing time for certain compute jobs
Thanks! I read the article, I read the post the article is responding to, I read all the comments and still I had no real idea what it all was about until I read your comment.
It could be an example of an author assuming a general audience already knows the insider information but then I don't know who the target audience really was. This is the kind of thing that probably should have been spelled out in the introduction, with a link to something like this:
Ironically, 95% of this article is irrelevant fluff that does not make it any easier for the reader to understand.