| HN Mirror

I agree you may have a few transitional states to go through before dev becomes production. And in general, we didn't let the migration tools write our scripts for us, we used them to automate the testing and deployment of our scripts, maybe that is the key difference in how we used them. Which is maybe why I have never experienced what you are saying, when we added the tools in place, properly set things up and used them consistently we had less issues, and smoother deployments (not perfect, but smoother). But again, they were larger teams all working in the same data sets so we had to have coordination etc. Plus we had DBA's that were gate keepers for the database systems, and we never allowed developers write access to production, that is just asking for issues.

As for some of the issues I have seen with teams moving fast or that are larger. In one case we had around 40 commonly used databases, hundreds of tables in most of them, millions of data rows, multiple servers etc. Not saying the DB architecture was ideal, but it was not uncommon for larger systems at mid+ sized Enterprises I have seen as a consultant. Many times it is when an organization is between rapid growth still and getting discipline that you see a lot more of these types of issues (larger teams, fast moving still with a lack of BTDT leadership). The issues we saw between dev, stage and production were 1. the datasets were too large to replicate from Production to each environment, 2. security concerns over too many people having access to some of the data, 3. multiple teams split functionally instead of per sprint.

So how did these cause issues.

1. We would not see all the data in each environment and while we would test across a wide swipe of data and also run queries trying to validate our alter and update scripts it wasn't uncommon every couple of deployments to have DB scripts fail to run because of some data we couldn't test against. The toolset helped us a little here, because the DBA's set it up to auto test changes against the production data before it would let the script run. And every time a script would fail they would add new test cases if it warranted it.

2. Since not everyone could have access to all the data they couldn't predict every outcome, see #1. So the toolset helped us automate what they couldn't see so we could prevent some issues. This is very common when dealing with financial or PII in health care settings. Also, this SHOULD and could have been mostly resolved by not splitting teams functionally #3.

3. Teams were split by their function instead of sprint. We had a data team, analytics team and development, each working mostly independent of each other. Meaning it wasn't rare for the data team to release a script to production in the morning which didn't get applied to dev/staging and then development comes along a few hours later tests and gets sign off on their changes in staging. Then development deploys to production and has scripts fail to run in production or causing some data havoc (or vice versa between the teams). Had the teams been integrated better this would have been less likely but still not impossible. And I hear all the purist saying well the problem was someone skipped stage etc. Yep, it sure is a core problem, but frankly in every environment I have worked I have seen it happen and so to dismiss it as it shouldn't happen is naive. Like saying well I am not testing for any errors in my code because there shouldn't be any, uh huh. So when we used the toolset it would tell us hey, something has changed here, and the DDL looks different than what you said it should look like, abort.

Also, most places I have seen need to routinely do production to stage and dev refreshes to help prevent a lot of these types of issues, which right there tells you that shit happens and it is nearly impossible to plan for all outcomes when it comes to complex systems. So tools are the right thing to help you.

Last point, when we did have major deployments and a lot of DB scripts were going out, the toolset also would run through all the scripts and warn us of conflicts. It would give us false positives sometimes, and miss some things, but it really saved our asses more than once.

Sorry for the length, just wanted to share. FWIW, we agree the simple methods work, its just that tools and frameworks, implemented properly really do mitigate some of the human and process failures we all have from time to time.