Hacker News new | ask | show | jobs
by fendale 4474 days ago
I agree that rollbacks are hard, but if you think them through, there is usually a way.

For your example, to merge first and last name, you would create a backup first, ie, create a table that contains the current version of first and last name, and if you need to go back, you restore. Now, if you have an application that cannot have any downtime through DB upgrades you could be in trouble - after your backup, more records could come in to the original table. Probably what you need to do is modify the application to start inserting both full_name and first_name, last_name, and then gradually move over all the old data to the new format etc - ie a gradual migration. Every application and example is different, but the problem is usually solvable, often with some amount of effort.

The master schema create script is an idea I've seen in the past, and it works ok-ish up to a point. The problem is you need to update the master script, and also produce scripts to migrate production, so it doubles the effort. For enterprise database developers, I've seen plenty of development approaches that boil down to:

* Create all changes on Dev using GUI tools. * Used TOAD to diff against production and produce delta script. * Apply delta to prod.

Its gets the job done, mostly, so long as someone doesn't drop the dev database which is the only place with the new code - I've saw it happen. Then you have to go and beg the DBA to do a full restore just you get the release back on track!

2 comments

TL;DR It is fundamentally impossible to find a solution that allows one to freely migrate any database schema and the data contained in that database forward and backward without any constraints. In practice you can push those constraints a bit around and most of the time you can hide them in dark corners where nobody looks and nobody gets hurt, but there is no silver bullet and there will never be one. (The last paragraph might be interesting, too.)

I completely agree with you but I was a bit unclear about what I wanted to say. What I really wanted to say is that the problem is fundamentally unsolvable in the general case no matter how hard you try. The pathological case is that from some point in time on you capture less information than you did before - only the full name instead of first and last name.

This is admittedly a relatively rare case in practice but nonetheless possible. Now there is fundamentally no way to migrate data captured past that point back before that point because you just don't have the necessary information and can not make it up out of thin air. As you mentioned it is not a big problem to migrate data captured before that point forward and backward by just keeping the additional information around and therefore making the migration step non-destructive.

In practice it should be a relatively rare case that you have to migrate data forward and backward repeatedly so the relevance of all that is not to high. Most of the time you start capturing additional information or stop capturing no longer relevant information and these cases behave much nicer under schema evolution. But especially when ever you change the way you capture some information you will from time to time get into trouble.

It is instructive to think of data migration in terms of mathematical functions. The function to migrate the data forward is always defined otherwise you could not perform the migration in the first place. But not every function has an inverse function and so you may not be able to migrate the data backwards. You can always make up a function that returns a tuple of the function value together with all arguments and therefore make inverting it trivial - this is exactly what also keeping first and last name around when you actually want to merge them into full name means - but as mentioned above this fails for new information where you only capture the function value and no longer the function arguments.

I think you misunderstood what the OP was talking about. Have you ever looked at migration systems like South for django or the migrations in ruby on rails?

He was saying how the reverse of a migration is night impossible, because what happens if you add a new user after applying that first name + last name migration, for which you don't have the split name.

That's why you apply the approach he was stating - you move slowly. Step 1. insert name column and populate from fn/ln. Step 2. Change code to point to name column as well as populate fn/ln Step 3. remove code pointing to fn/ln. Step 4. remove fn/ln columns.

No loss of data, instant reversion at any point, because each step is non-destructive. It's just longer, harder to do in reality, but definitely the "right" way to do it if you are 100% concerned with data validity.

At step 3 or later reversion becomes impossible. A new record will be added to the DB without a first name, last name making rollback impossible.

Also how is step 4 non destructive?

But at step 3, you've already validated the code change. You know that the name only code works as intended, and you're just cleaning up old code at that point. I do agree that there is a potential for failure at that point, but I'd think it to be reasonably mitigated in comparison to making the change and needing to then roll back.

Step 4 is non destructive from a data perspective, as those columns would be "zombie" data. they'd exist, but nothing would be referencing them at that point, and the data would simply exist in a different form.

If you're talking about representing the data differently, yes that would happen, and in a case like the above, you'd be much better off by preserving normalization and using a view to combine into a "name" entity. Th