| I am convinced that data migration is definitely one of the hardest problems in data management and systems engineering. There are basically no solutions today that satisfy fundamental requirements such as minimizing downtime and guaranteeing correctness. It is _such_ a huge problem that most inexperienced developers see kicking the problem down the line with NoSQL document storage as a viable alternative (it isn't; you'll be either dealing with migrating all data forever and special-casing every old version of your documents, or writing even more convoluted migration logic). It's also clear that even the most modern ORMs and query builders have not been built in mind to consider the issues that arise in migrating data. It would be a refreshing thing to see more research devoted to this problem. Unfortunately, migrations end up being so different from each other with such heterogenous requirements that we'll probably be working on this for a really long time. |
I think that more sophisticated static analysis and migration generation tools would really help out quite a bit in making this a reality, especially if you combine it with something. Having something like rope[3] for generating migrations and hypothesis[4] for using property-based testing to generate tested cases would make things nice as well. Definitely a hard problem, and definitely a worthwhile one to solve. If our team ever gets some free time to build a toolkit, we'd enjoy building some tooling to put all of this stuff together!
[1] https://en.wikipedia.org/wiki/Anchor_modeling
[2] https://github.com/kvesteri/sqlalchemy-continuum
[3] https://github.com/python-rope/rope
[4] https://hypothesis.readthedocs.io/en/latest/