Hacker News new | ask | show | jobs
by tgarma1234 3577 days ago
The fact that you can pass attributes for a record into the JSONB field without defining the table structure in advance is really the decisive feature because then you never have to bother with changing your data model. For example, if you have contacts streaming into your table from Android devices you don't need to say whether or not there should be a column for "work email" and "home email2" etc etc... you just send everything into a column with key/value pairs and you can put whatever key/value pairs you want in that column. And then query over the keys without inserting a gazillion nulls into your database for rows that don't have a value for a particular key. You can also do updates on the json column AND you get all of the benefits of relational databases with other columns in the same table. I can't really see how anyone would not love this data type now that I have been exposed to it in production.
3 comments

This is an attractive trap. This mode of data modelling is actually quite terrible in terms of maintainability. It is precisely the problem with NoSQL database models.

It doesn't mean there is no data model (schema), and it doesn't mean that the data model is flexible. It actually means you have a succession of distinct and undocumented schemas, which are updated on a haphazard, ad hoc basis, with no documentation or thought given to this event.

Every version of every app and every support program ever written then has to know about each and every historic variant of the data model that ever existed. This is a maintenance nightmare when your app is more than a few iterations old, and when you have several decoupled support systems trying to use the database.

With an overt schema, you are required to at least think about what you're doing and to do it in a centralized fashion, rather than slip changes in haphazardly in any app that ever touches the database, and you're required to ensure that the data already in the database actually conforms to the new schema. You won't have one app that puts the work email in "work-email" with a dash, and another that tries to use "work_email" with an underscore, for example.

I don't agree with that because you really just query the keys for users that have the attribute in question without inserting a ton of nulls and having to work out changes to the attributes when you realize "I wish I would have thought of that column" several iterations later. Spending time up front agonizing about the "overt schema" strikes me as an anti-pattern when you are likely going to pivot later and it's super convenient to just use the same tables as you have been using all along. New users are different in that case but not so totally different so just but the new attributes into the column keyed on "pivot1" or whatever.
There is no need to do that. Should not do that if you have arbitrary values in your database. Just create another table with all the keys and values aligned like that:

    id | other_pk | key | value
This of cause would require another query to the database and you might have to iterate through all retrieved rows in your code to build the hash table manually (although your JSON library will do the same).

Im not saying that you should not use JSONB for that (I probably would too for that example), Im just saying there are more ways to design your data storages.

This well-trodden pattern is called EAV (entity, attribute, value). It's attractive up to the point where you want to do interesting things with multiple attributes, which is when you start doing multiple nested self joins on the EAV table and performance gets hairy.

For smaller scale auxiliary data it's fine, but so is JSONB.

Where can I read more about this?
You think database schemes are somehow immune to evolve haphazardly over time?

I've seen otherwise...

The point is not that the DB schema will evolve sanely, nothing can guarantee that (nor with JSON).

The point is that the schema is explicit: worst case, I can go look it up, and I /know/ that the data conforms to it.

No one is arguing it's not impossible but it's certainly harder to mess up a schema than a document db or JSON field.
Multiple, independent apps poking changes into the database is a kind of failure too. For writer apps > 1 it is often preferable to route them through a middle tier / web API / etc.
Data lives far, far longer than applications. You may not have multiple apps using the database now, but if your system lasts 10 years or more than it certainly will.
I think this approach creates bottlenecks and ultimately has an inner-platform effect.

Use your RDBMS to the max, they are powerful tools!

> "Query over the keys without inserting a gazillion nulls into your database for rows that don't have a value for a particular key".

Having learned that postgresql NULLs are cheap in storage (since they just use a bitmap) I stopped caring much about empty column proliferation.

I've also found that arbitrary user-defined keys are still better treated via a "known keys" table, the long-term visibility into data is invaluable, there's more join flexibility, and there's opportunity to assign metadata (e.g. type hints) for each user-defined key. At scale, these things make json columns look like technical debt.

The only use case I currently have for jsonb is when the absence of an element is meaningfully different to a null value e.g. options structures that wouldn't otherwise benefit from being in normalized tables.

What is a "known keys" table?

Is it a table with columns like user_defined_text_1, user_defined_text_2, ..., user_defined_text_10, user_defined_integer_1, user_defined_integer_2, ... user_defined_integer_n ?

No, that kind of "extension table" is a well-known antipattern being fragile, unscalable, and denormalized. Just a table with columns for id, keyname & metadata about the key will do.
> No, that kind of "extension table" is a well-known antipattern being fragile, unscalable, and denormalized.

Ok, I agree. This is the reason why I was asking :-)

> Just a table with columns for id, keyname & metadata about the key will do.

Isn't this the Entity-Attribute-Value model (which has the well known drawback of requiring a join for each attribute)?

Not exactly, I wouldn't be afraid of the data table having both attribute name & value columns. EAV is practically 6NF which would prohibit that. (to be honest that id field was out of habit, the known-keys table should probably be unique and indexed on the key name)
I'm sorry, but I'm still unsure what you mean. A "known-keys table" is not a common term in the literature. Can you give an example by describing the tables and their columns?
This is good but please, to preserve your own sanity, first design the table struct properly to capture all attributes you can think of in separate columns and add the appropriate primary key. Then add the extra JSON(B) column for unforeseen data. Keep iterating till the table is still small.
You will never think of all the attributes anymore. For example 8 months later you want to score your contacts based on derived or imputed values. Who knows what you will want to slap in there when you scale up to 5 billion rows and start including click data from your website for example... there are basics from relational concepts that are obvious, which is why you are using postgres in the first place but table structure is not one of them.
"I might want to add more columns later" is not a good justification for avoiding a relational model. If you realize that you want an additional column, just run an ALTER TABLE statement.
Totally agreed. Some people have an unreasonable fear of migrations (not to try to put words in the above poster's mouth, just pointing out something that I've noticed). You generally shouldn't worry about migration-related downtime with Postgres if you're not using table constraints, using columns with defaults, or changing a column's type. There are very few migration procedures that require something like a table rewrite these days, or even an index rebuild.
Migrations are a pain in the arse. But not as much as trying to maintain a "schemaless" app.