| Except client A never crashed when receiving data from server A. You can rightly argue that client A was irresponsible in not protecting itself more intelligently against invalid or unexpected inputs (dumping core is the crudest, bluntest protection there is; not the best choice in a distributed environment where transmission errors are a fact of life); but the system as a whole worked. Had server B continued to supply client A with data in the format that client A expected to receive, no crash would have occurred and the entire rolling upgrade would have gone without a hitch. But no, the lazy irresponsible corner-cutting assholes couldn’t be arsed doing that; they just start pushing the new data to everyone, and then blame everyone but themselves when that goes sideways. “The correct answer is to roll back.” The correct answer is never to get into a state where rollback becomes necessary. Though having failed to do that, and so ended up in exactly this state, immediate rollback of B to A may well have been the next-best response, followed by system audit to determine what integrity/data loss has occurred and post-mortem of the procedures used, and subsequent corrections so that it doesn’t happen again. But if you think a bunch of cowboys who were only too happy to shirk their responsibilities during the (private) development phase are suddenly going to own up and accept personal liability when it blows up in the (public) rollout phase, then boy, do I have an eight-figure Enterprise-y grade bridge to sell you. |
We occasionally roll out bad software. I know of no reasonable set of practices which can avoid it.
I honestly don’t understand how you would expect to make this possible without an obscene budget + insanely slow pace of development.