Hacker News new | ask | show | jobs
by roenxi 1859 days ago
Postgres is bowing to the inevitable, JSON support is too much in demand.

But this is going to be a classic example of bad design. Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard. It is pretty easy to see how JSON will play out: some bright young coder will use JSON because it is easier, then over the course of 12 months discover the benefits of a constrained schema, and then have a table-in-a-table JSON column.

It isn't so out there to think that ongoing calls for JSON support will lead Postgres to re-implement tables in JSON. We've already got people trying to build indexes on fields inside a JSON field.

This is needless complexity engineered by people who insist on relearning schemas from scratch, badly, rather than trusting the database people who say "you need to be explicit about the schema, do data modelling up front".

8 comments

> Postgres is bowing to the inevitable

I think PostgreSQL has always been very pragmatic. It's supported JSON natively since 9.2 (Sep 2012).

> Databases are a bad place to be storing JSON

You're right that "mature" features and projects have a very good understand of the schema. But not everything is that.

Suppose I want to collect info from the Github API about a bunch of repos. I can just store the entire JSON response in a table and then query it at my leisure.

There's also something to be said for contiguous access. Joining tons of little records together has performance problems. Composite types and arrays can also fill this void, but they both have their own usability quirks.

I use the json features of postgres to turn json into relation data. Store all json messages received in a table, then use a materialized view to extract the relevant parts into columns. Works well, and lets me keep the original data around.
roger that.
> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

That's why in 99% of cases, Postgresql uses jsonb as storage standard, which is binary and compressed.

> This is needless complexity engineered by people who insist on relearning schemas from scratch

No, this is the right tool for situations where schemas are polymorphic, fluid, or even completely absent (like raw third-party data). I love SQL and following normal forms, and it is the right tool for most situations, but not all.

I've learned and applied my schema normalization and what have you got. But it's not the be-all and end-all of good engineering. What I greatly appreciate about hierarchical value storage in contrast to related flat records, is that it is so much easier to store and retrieve a tree. No need to generate ids and insert rows one by one, no need to decode the result of large joins. Because it doesn't only take time to write code for that, it can contain errors too.

If you've got hierarchical data and you just want to store, update and retrieve it as a whole (which is my use case), JSON is a good choice. Granted, it could be stored as a string/blob in my case. I don't really need to search within.

JSON in Postgres is a bit like a nail gun. Used correctly, it's incredibly useful. But in inexperienced hands (and lacking good technical leadership), it's easy to shoot yourself in the thigh.

You don't even need JSONB to commit war crimes on a Postgres database. There's many things that Postgres can do, but probably shouldn't be done:

- Storing "foreign keys" in an array column, instead of using a join table

- Storing binary files as base64 encoded strings in text columns

- Using a table with `key` and `value` string columns instead of using redis

- Pub/sub using NOTIFY/LISTEN - Message queueing

- Other forms of IPC in general

- Storing executable code

- God tables

Even when trying to use Postgres appropriately, plenty of engineers don't get it right: unnecessary indices, missing indices, denormalised data, etc.

This isn't unique to Postgres, or relational databases in general. Any form of storage can and will be used to do things it's not designed or appropriate for. You can use as easily use S3 or Elasticsearch for message queuing, and can even find official guides to help you do so. Go back 20 years or so, and you can find implementations of message busses using SOAP over SMTP.

The problem isn't JSONB (or any other feature). It's bad engineering. Usually it's an incarnation of Maslow's Hammer: when all you have is a hammer, everything looks like a nail.

> Storing "foreign keys" in an array column, instead of using a join table

Very bad idea for a base table, sure. OTOH, potentially great idea for a (possibly materialized) view (as might eagerly storing an array of row values instead of keys.)

> Storing binary files as base64 encoded strings in text columns

That might not be a bad idea depending on size, and depending on how often you needed the binary vs. a base64 encoded string: if most of the use of the binary is in a context where it will be sent as base64 encoded, storing it that way might be a great idea.

> Using a table with `key` and `value` string columns instead of using redis

If you need an in-memory cache, sure. Otherwise...you could use redis, but I’m not sure why it would always be preferred.

> Pub/sub using NOTIFY/LISTEN - Message queueing

NOTIFY/LISTEN are a pub/sub mechanism. You shouldn’t use them alone as an application level message queueing system, but you definitely can build such a system on PG and might well use NOTIFY/LISTEN in the implementation.

> - Storing executable code

There are probably problems that involve specific instances of doing this, but this is at best to general to describe a thing yoi shouldn’t do.

Yeah re the binary base64 encoding - we found that when using text based queries (the default for pg gem on ruby and most other client libraries AFAIK), base64 in a text column outperformed byte columns as it avoids extra conversions to and from base64.

We did switch to using binary queries though (by passing result_format as 1), and at that point we did see a speedup from using bytea columns - but it’s quite a bit of work to get the type mapping correct then so probably not worth it in most cases

> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

JSON makes perfect sense in a database that already supports all of: BLOB, TEXT, XML, ARRAY, and composite datatypes, including any datatype, including those on this list, for members of ARRAY and composites.

OTOH, Postgres has had XML since v8.2 (2006) and JSON since 9.2 (2012), and “tables in <supported structured serialization format>” hasn’t happened yet, even as discussion item AFAIK, so perhaps it would be bad, but even so it seems to be just fantasizing something to worry about.

The native JSON data type was introduced with PG 9.2 in 2012.
I know. And it isn't a new argument either, I for one have been making it for years.
> This is needless complexity engineered by people who insist on relearning schemas from scratch, badly, rather than trusting the database people who say "you need to be explicit about the schema, do data modelling up front".

The reason is with some projects/data it's hard to be explicit about the schema which is why NoSQL had it's popularity phase.

Now most applications don't have either entirely structured or entirely unstructured data, they will have a mix - so it's absolutely brilliant for one tool to do both. If they didn't support JSON I have a strong suspicion that they wouldn't have had some of the growth we have seen for Postgres across the last few years.