Hacker News new | ask | show | jobs
by pjlegato 3574 days ago
This is an attractive trap. This mode of data modelling is actually quite terrible in terms of maintainability. It is precisely the problem with NoSQL database models.

It doesn't mean there is no data model (schema), and it doesn't mean that the data model is flexible. It actually means you have a succession of distinct and undocumented schemas, which are updated on a haphazard, ad hoc basis, with no documentation or thought given to this event.

Every version of every app and every support program ever written then has to know about each and every historic variant of the data model that ever existed. This is a maintenance nightmare when your app is more than a few iterations old, and when you have several decoupled support systems trying to use the database.

With an overt schema, you are required to at least think about what you're doing and to do it in a centralized fashion, rather than slip changes in haphazardly in any app that ever touches the database, and you're required to ensure that the data already in the database actually conforms to the new schema. You won't have one app that puts the work email in "work-email" with a dash, and another that tries to use "work_email" with an underscore, for example.

3 comments

I don't agree with that because you really just query the keys for users that have the attribute in question without inserting a ton of nulls and having to work out changes to the attributes when you realize "I wish I would have thought of that column" several iterations later. Spending time up front agonizing about the "overt schema" strikes me as an anti-pattern when you are likely going to pivot later and it's super convenient to just use the same tables as you have been using all along. New users are different in that case but not so totally different so just but the new attributes into the column keyed on "pivot1" or whatever.
There is no need to do that. Should not do that if you have arbitrary values in your database. Just create another table with all the keys and values aligned like that:

    id | other_pk | key | value
This of cause would require another query to the database and you might have to iterate through all retrieved rows in your code to build the hash table manually (although your JSON library will do the same).

Im not saying that you should not use JSONB for that (I probably would too for that example), Im just saying there are more ways to design your data storages.

This well-trodden pattern is called EAV (entity, attribute, value). It's attractive up to the point where you want to do interesting things with multiple attributes, which is when you start doing multiple nested self joins on the EAV table and performance gets hairy.

For smaller scale auxiliary data it's fine, but so is JSONB.

Where can I read more about this?
You think database schemes are somehow immune to evolve haphazardly over time?

I've seen otherwise...

The point is not that the DB schema will evolve sanely, nothing can guarantee that (nor with JSON).

The point is that the schema is explicit: worst case, I can go look it up, and I /know/ that the data conforms to it.

No one is arguing it's not impossible but it's certainly harder to mess up a schema than a document db or JSON field.
Multiple, independent apps poking changes into the database is a kind of failure too. For writer apps > 1 it is often preferable to route them through a middle tier / web API / etc.
Data lives far, far longer than applications. You may not have multiple apps using the database now, but if your system lasts 10 years or more than it certainly will.
I think this approach creates bottlenecks and ultimately has an inner-platform effect.

Use your RDBMS to the max, they are powerful tools!