Hacker News new | ask | show | jobs
by garethrees 3673 days ago
It makes sense to store serialized data structures in the database when these conditions apply:

1. There are no use cases that would require you to SELECT on the fields in the serialized data structures.

2. You anticipate that the data structures are going to change frequently during development, so that turning them into relations is going to involve a lot of schema migrations.

Basically you give up the possibility of being able to SELECT on some of the data in return for being able to change its format rapidly and cheaply.

I worked on a project recently where this was helpful -- when I designed the database schema I didn't know the details of many of the data structures that were going to have to be stored there. From the use cases I could deduce the set of fields that would need to be SELECTed on, but the other fields were ill-defined. By storing them as blobs (actually as JSONB fields, since this was PostgreSQL) I could safely defer the decision about how to design these parts of the database, without incurring lots of schema migrations along the way.

1 comments

Be very careful with this logic. I'm in the middle of cleaning up a project where the original developer did this calculus and stored a lot of metadata as JSON strings in text fields. Now, three years out, we have new reporting requirements that need to be able to filter on some of that metadata.

Experience has taught me that it's next to impossible to know what types of queries will be needed for the entire lifetime of a project. If you absolutely must store data like this, please leave a clear migration path for moving some or all of it into well structured tables when (not if) it becomes necessary.

I understand your concern. Perhaps if I give an example, it will be clear why I followed the approach I did.

The project is an industrial control system. Different kinds of item are processed by the system. Each kind of item needs image acquisition parameters, for example field of view. There's no use case that needs to select kinds of item whose image capture parameters specify a particular field of view, and it's clear that such a use case is very unlikely to arise — the image acquisition parameters are part of the internal specification of the system, not something that any end user is interested in.

At database design time it wasn't settled which model of camera was going to be used, and so it was far from settled which parameters were going to be needed. Using a JSONB field for the image acquisition parameters meant that I could leave the specification of these parameters to the camera programmer, without having to incur a series of database migrations as the correct set of parameters were worked out.

Even in your project, where new reporting requirements eventually arose, it doesn't mean the original decision was necessarily a bad one. Sometimes it makes sense to incur technical debt in order to bring a product to market in a timely fashion.