Hacker News new | ask | show | jobs
by AdrianB1 763 days ago
I would never do that. A delete trigger to a generic archive table that stores the row as a row, this is what I would do. JSON in a RDBMS is something that can be done, but rarely should be done. Why keep it in the database and not as a file on disk (in the filesystem), otherwise?
2 comments

You want to disconnect it from fk constraints and schema because you don't want to manage schema evolution on deleted history.

If you keep it as mirrored schema you'll have to manage individual history table per table (high overhead), manage it on every migration and you'll run into problems sooner or later - you won't be able to migrate it correctly as historic records will be disconnected unlike ordinary ones.

How do you store it as a row in your case? JSON offers a very reasonable trade off; it works forever over all tables in spite of schema changes.
A schema change combined with a soft delete is a very different scenario. RDBMS are a very expensive way to store JSON.

Soft delete purpose is either short term "simulate delete until you are sure you can hard delete" or "hide it and archive it". In the first case long term storage is not a problem, in the second you want to keep it in the database if you want the option to query it (and you want the table structure for that, JSON query is very expensive) or take it out if you don't. At least these are the use cases I saw in 30 years of software engineering.

Keeping deletions must have some answer to schema evolution of first class data.

In case of json the answer is detach.

Performance wise:

- you have access to indexed timestamps - this allows you to narrow down resultsets

- you have indexed or sharded-on table name column - useful for more monolithic databases with many tables and large datasets

- you usually want uniform (string) or variant non-unique, indexed identity fields, ie. composite up to 3 (identity, secondIdentity, thirdIdentity columns)

You don't care about the rest because its schema will vary in time and you don't want to deal with it.

You keep those deletions as second class objects because you won't interact with them as you do with first class data. You keep them for things like audit, down migrations, restore, "we can't delete stuff because we need to keep history for 6 years" etc. in a place that doesn't clutter primary dataset.

If you want typed schema you can add version column and keep version based schemas, manage schema schema version migrations in the background outside of your usual deployment cycle etc - but this becomes quite fancy, usually you don't care, you care about what's in use.