| I love articles like this because it's so easy to just add that company to a list of places to never ever work. I did read the whole article, btw. It's an absolute clickbait title that the author doesn't really mean, and after the article spends a lot of time diffusing the clickbait title it really boils down to, "This is hard, so I give up." It's true that many--if not most--companies operate this way without ever acknowledging it. And that's bad. It's also true that systems are harder to test than code. But it's not deep fucking magic. Look at the work aphyr does. Look at the testing work that the FoundationDB team did to prove their system's guarantees. Look at the work that security and devops people do every day. She is right that it is hard to test systems. So what? We don't get paid as much as we do because it's easy. In a certain environment, it is truly impossible to test a system. That's when you have a dev culture that refuses to actually design knowable systems. A much better approach for the article would be to address exactly why systems are so hard to test rather than just saying fuck it. Everything she cites in her list of things that are hard to test are absolutely testable, if you have a knowable system. The real problem here is that agile/scrum/Xtreme programming practices inevitably and by principle do not result in knowable, testable systems. When you have 30+ agile teams on their own sprint cycles and product managers leaning on them to ship features and figure the rest out later, there can be no other result than fragile, broken, unknowable, untestable system. But the answer to that isn't "Everybody else is doing it so why can't I." The answer isn't to "embrace it." The answer isn't "This is hard, fuck it." The answer is most definitely not to make individual engineers pay the price of being on call because a company's culture and process are totally and completely hosed. The answer is to address the problems in your company that caused this situation in the first place. The answer is to get your head out of the feature cult and the velocity war and reset your priorities. Systems aren't hard because your engineers suck. They're hard because companies suck. Systems are hard because in most places, no one is allowed to spend more than a couple minutes thinking about the systems. Agile culture after your early startup cycle is a lot like being a 40 year old guy who's 30 lbs over weight. How did this happen? How did I get here? I was just taking life one thing at a time and getting shit done. Now nothing works quite as well as it used to, it's harder to find dates, and everything just sort of hurts. Would anyone in their right mind just say, "Embrace it! Most 40 year old tech dudes look about like you and are in the same situation! It's fine!" No. Of course not. You have to realize that your priorities have been totally broken for the last 15-20 years of your life, that you really weren't getting shit done, and you have to take some responsibility for your diet and get off your ass and exercise. That's what companies have to do. They won't, of course. But they have to, otherwise they'll die young deaths. This article is totally correct when she recognizes a terrible symptom of unhealthy companies. But her treatment is hopelessly and tragically wrong. |
If, however, your configuration space grows to an even middling size, it no longer becomes feasible to do much of this validation across the configuration space. A good example is any system where the user can customize system aspects. Do you run all of your integration tests across the full configuration space?
Additionally managing configuration skew between a dev and prod environment is not simple. Simply claiming that there should be no skew doesn't work. Often you want the prod and dev environments to run as different users, and you certainly want them to have different acls (your dev environment should not have access to your production database).
So you now have to, across your configuration space, validate that only the things that are "supposed" to be different differ, and that the things that aren't don't. Which maybe works for a while, but your prod configuration may also differ across parts of prod if, for example, a change is being canaried or incrementally deployed.
I've spent a non-trivial amount of effort on trying to solve the one problem of configuration skew between dev and prod for one real system. It's ultimately not worth it. The effort expended to "fix" that would be more work, than not. And I mean that in the long term, the effort to maintain and follow the rules that such a system would impose is more effort than dealing with the annoyances of unintended skews.
Systems are hard because systems are hard. There's no good company that doesn't, test/experiment in production. All of them do.