|
|
|
|
|
by sd9
65 days ago
|
|
> The goal of the current maintenance is to fix a lot of long-standing issues with the site. The underlying infrastructure was getting very fragile as technical debt accumulated over time. A team is working very hard right now to make sure that once the site is back up, it's on much better footing and will be solid and reliable for the long term. Despite the unfortunate amount of time this is taking, it will be a major benefit to the site in the long run. If I were a developer there I would be feeling really not very good. Just minutes of downtime on the systems I’ve worked on gets my heart rate going. It also feels like there’s a lot being left unsaid in this statement. Normally you would work on these things in parallel to production… so something is seriously wrong. |
|
In all those cases there is serious planning done before the migration, checklists, trial runs/validations, and validation procedures day off. If something isn't working, the leadership group evaluates the the issue and determines rollback vs go forward. Rollback needs to also be planned for, and your planned downtime window should be considered.
I agree with you, this wording implies they are making changes after this change. This could've been bad planning, a bad call day off, etc.
In one scenario, we _had_ to go forward while resolving several blockers on the fly. We had planned ahead of time developer rotation shifts. Pulling people off the line after 8-12hrs. At some point, you aren't thinking clearly understress. Don't know how big the team is over there is, but I hope they are pacing themselves, during what I am sure is a horrible moment of crisis to them.
My advice to them is, consider a roll back if needed/possible. Split responsibility between who is managing the process and dealing with specific problems. Focus on MVP. Don't try to _fix_ and replace at the same time, if something was broken before business wise, log it in your bug tracker and deal with it later. Pull people away if needed to get rest. Get upper management away from people doing the work, have them only talking to the group handling the process management.
Edit: I am also making a good faith assumption that this is planned and not an emergency response, either way, it doesn't change my general advice.