Hacker News new | ask | show | jobs
by vog 3574 days ago
I was also wondering about this one.

This may make sense if you have custom fields, i.e. the set of keys is user-defined and ever increasing.

Apart from that, there is no reason to do this. If it is all about avoiding having "too many" columns and/or "too may" null values, then I'd say: Don't worry. Just use as many columns as you need.

I would even go one step further and say: It is a common anti-pattern to introduce generic key-value stores (through a separate table, JSON structures or XML structures) without a compelling reason.

3 comments

I'll add another reason: subtle application issues creep in with value states.

With JSONB I can receive a boolean element with five states: null, true, false, invalid (i.e. actually a string, number or array value), or simply nonexistent, and it's up to the application to deal with all such cases. You may have to treat the database as a source of potentially invalid data that must be sanitized.

With regular DB columns you can reasonably assume that a boolean not null column will a) exist for all records and b) return either true or false. You need only scan the database schema at application start to verify this.

On the other hand, if you're using JSONB for an options structure then the possibility of key nonexistence may actually be useful, since it implies "use system default". (Yes this can differ from the meaning of a NULL value, especially if you are merging options structures).

You might question whether it is wise to have a "boolean" key/value pair with four semantically distinct states. I can only say I have done this and I am not proud of it.

To be fair, you can write CHECK conditions which are arbitrary expressions involving JSONB values. Though at that point you might as well just use a normal column.
By the way, another related anti-pattern is the introduction of an additional table that has a 1:1 relationship to the original table. Because the original table was "full" of columns, or something, so a second table had to be opened to contain the remaining columns.

This is an anti-pattern because you can't model "1:1" that way. Instead, it will be "1:0..1", and now you have some nasty corner cases when the first table has an entry whose counterpart in the second table is missing. Also, when using a column now you always have to think about which table had it - the first or the second table?

I had to work with such a design in a real-world project and it was really annoying.

The way I have seen for the issue of which table is to introduce a view and setup triggers on the view to run a stored proc when you insert or update it to handle all the tables....
Ick. I use 1:0..1 tables commonly. But not 1:1. Ick.
Just a note that NULL columns take up very little space in Postgres. IIRC there is a NULL bitmap for every row. So yes, don't worry and just use columns, if the columns are actually part of your schema.
This is correct. Postgres keeps a bitmap in each row that designates which of the rows are null. Postgres initially allocates one byte for the bitmap so it will work for up to eight columns. Once your table has more than 8 columns, postgres will allocate eights bytes for an additional 64 columns. That means if you have more than 8 columns, you can keep making "optional" fields proper columns until you have a total of 72 columns because null values are completely free.