| > I think that you are saying is no, it doesn't matter. No I'm not saying that. I'm saying that the best way to prevent that isn't always to have a staging environment that mirrors production as well as you can. > Individuals sitting around making isolated, disconnected decisions like the ones you're talking about (i.e., it just isn't worth it; it's not feasible; it's hard) compound in organizations and create the kinds of systems you don't want to deal with. You're making your own hell here. You seemed to have missed that key point in my earlier comment. No, this was an intentional decision by the organization, that the organization shouldn't continue to invest time in solving the problem this way, because after significant effort expended by the organization, the conclusion of the people who the organization asked to investigate the problem was that solutions would not be feasible and would not improve things. You're acting like these decisions are always made in a vacuum. They're not. Often smart organizations investigate and make decisions at the level of leadership. > Conflating test/experiment with what the original article claimed to be talking about (and then later walked back) is borderline disingenuous. No one is talking about A/B testing or intentional experiments. Are you sure? FTA: > We conduct experiments in risk management every single day, often unconsciously. Every time you decide to merge to master or deploy to prod, you’re taking a risk. > A healthy culture of experimentation and testing in production pulls together all three. Canarying is just testing in production, but you have processes and "guardrails" (quoting the article) to make sure that it is done safely by default. For the record, I work primary on reliability and release/experiment, and so I'm well aware that being explicit about your decisions is vital, as is knowing the tradeoffs involved. That's why pretending that you don't test in prod is a bad idea, because you almost assuredly do. That's what the article is saying. Edit: As for Cassandra, it looks like they have system bugs caught in production, so I'm not sure what your point is (https://issues.apache.org/jira/projects/CASSANDRA/issues/CAS...) |