Hacker News new | ask | show | jobs
by moksly 1865 days ago
Here’s the thing about public data from the perspective of the public sector.

We operate 300+ different systems, some are tiny, some are major. Some are build as “modern” web-fronts with healthy APIs behind them, some are older than you and run on a mainframe behind multiple layers of APIs that have been added over the years as demands changed. Most are build and maintained through public procurement processes, and some have been through several different companies and a myriad of different approaches to how you build software. Some were build by companies that haven’t existed for a decade, designed by people who are retired. Many share data but none of them agree on data models. Even if you have natural enterprise architecture descriptions of what a specific data model for a “person “ should look like when it goes in or out of a system, that’s not a guarantee that it’s also what a “person” actually looks like inside the system. Many suppliers make a lucrative “additional content” business out of selling BI modules that translate their data into something our analytics departments can use, and as such, have a vested interest in not making the data too accessible on its own. Many are clever enough to set up contracts that don’t give us access to our own data without buying it as an option, or often, as several options for the various parts of data. On top of this, almost none of these systems were (or are) designed to have clever data models or good documentation. That costs money, meaning your bid will likely lose the procurement process, but it also makes it easier for another company to “outbid” you in the next procurement process should you win.

Anyway, the point is that our data is a mess. As this article points out. What’s worse, it’s not just a mess, it’s a lot of different messes. We map our public toilets in our GIS systems, so do the 97 other communes of Denmark, but because we don’t have the same GIS systems, and because we don’t have the same data access agreements, you could easily end up with 98 different data sets, some being “never-updated” xmls, others being json, others being txt files and some being rest/soap/you name it APIs. This article is completely right when it tells you that this sort of sucks. In theory you could use the data to make an App that directed you to the nearest public toilet in Denmark, but good luck with the state or the data.

Here is the biggest issue that Open Dara faces. The political decision making layer doesn’t really care that Ronald Reagan released the GPS data “and look what happened”, not when they are distributing funding and have to decide whether they want to spend money on “good Open Data” or more teachers.

4 comments

Maybe I'm naive, but this seems to be in a similar class of problem to Linux's support for physical peripherals, and the answer seems to be the same: drivers that convert from a mess of different interfaces into a standard interface one layer up.

And like driver development, it can be done by pretty much anyone. Doesn't have to be the low-budget data providers, nor does it all have to be done by one group, as long as the various groups doing the work can agree on a standard interface.

This is a good idea, but unless *you* have some political capital and room in the budget to design AND perform AND maintain the process long term... You'll only add another competing "standard" to the mess. https://xkcd.com/927/
Up until you mentioned Denmark, I was 100% sure you talked about my home country, Finland.
Wouldn't most of the data you mentioned eventually become part of projects like OpenStreetMap?
The only reason to not make something public is because you're trying to hide something.

Even if it's just that 'your data is a mess'... you want to hide the fact that your data is a mess.

Data is either open, or you're trying to hide something. You're lying about something. Period.

This is not true - there are plenty of other reasons.

- it requires more effort than not releasing it - it requires approvals from other branches that are difficult to receive - your staff don’t have the expertise (yes, this is a real thing some places!) or budget

I’m sure there are many other reasons too.

TFA is arguing that raw, unsupported data that is difficult to consume is neither 'public' or 'open'.

Regarding hiding data, there can be many good justifications for that. Carelessly sharing data with the public is not particularly safe or wise.