Hacker News new | ask | show | jobs
by zzzeek 5237 days ago
Doing data migrations up front in a NoSQL system means you've changed the fields around in your document types, then you're done. Now 95% of your documents are wrong, missing those changes. Maybe you have linkages between different types of document (akin to a foreign key) - now a lot of those might be pointing to nothing.

In this model, it's the application's job to anticipate and work around these inconsistencies. Assertions that check for data integrity, if you have them, have to be modified to work around this. For a lot of the web applications we talk about these days, who cares - it's people's lists of friends and TODO notes. It's simple data and some dangling records aren't going to hurt anyone.

In the SQL world, we instead write migration scripts that migrate the structure and data all at once. This is a little more work up front, but as long as you stuck to a mostly normalized form and use great tools (which we now have) this is not a big deal ("MONTHS" to migrate across slaves sounds like they had some very wrong decisions made earlier on). The application can move straight to supporting the new structure and entirely forget that the old one existed. In this world, we can also have really complex relationships between fields, like when we're storing accounting or medical data linked to temporal records of changes. The application can consume this structure without worrying about consistency.

2 comments

I think the problem is that when you change a table structure in MySQL, it takes quite a bit of time to do the actual writes. Adding a column tends to force a rewrite of the table. Braindead I know....

I think the problem is that MySQL isn't really a standard SQL-world db. It has some of the advantages of one, but not all of them and some very annoying gotchas.

> The application can move straight to supporting the new structure and entirely forget that the old one existed.

Not always. Changing the schema can break an application, in particular when the database supports multiple applications.

> It's simple data and some dangling records aren't going to hurt anyone.

It depends on the application. In healthcare people literally die due to some dangling inconsistent records.

The problem in anticipating and working around these inconsistencies is that the workaround is added ad-hoc in code, rather than through a model defining the data change. You need a model http://chronicdb.com/preparing_schema_changes