|
> 3. "Infrastructure Sabotage" - this is the one that I think annoyed me the most, because I've seen cloud costs explode where hardly anyone had a good grasp on where that money was going. On the other hand, I worked in a place where the Change Request -> code review -> merge pipe took ~two days on average, and could sometimes span over a week, all because of ridiculous penny-pinching on infra. The CI build itself took 1.5h of which > 60% was testing. That was understandable, if not ideal. The problem was, however, that the whole infrastructure could support evaluating maybe 5-6 builds in parallel (20-ish runners, each build typically consuming 3 or 4, depending on type and platform configurations), and that was shared with automated tests (that run nightly), as well as all kinds of one-off release builds for QA, manual builds, etc. So it took two or three changesets being built in parallel before the next one had to wait on bots to become available. Add to that a few flaky tests that would fail one of the builds for your changesets about 30% of the time for spurious reasons, and people submitting more patches (= more build jobs) in response to review feedback, and you can imagine what it did to work cadence. To top it off, there was a cultural push for "best practices" of small and frequent changes; of course doing that basically meant you were submitting changes faster than they were built, DOS-ing the build system for everyone else. And, lord forgive you if you tried to submit a stack of commits at the same time. My co-worker and I both independently discovered that if you do a big enough stack (working on a large feature, broken down for reviewability), say 7-9 commits, not only you'll saturate the build system, but apparently OpenStack or whatever it was managing it would run out of resources and cascade failures to some other systems elsewhere in the company, because of course it would do that. I've fought to get this improved since almost day 1 on that team, but it was always met with reactions from IT (later, "devops") along the lines of: "what is your problem?" or "yes, I know, but we don't have the budget" (seriously? compute is, and was then, almost too cheap to meter), and eventually stringing us along with half a year of "we're migrating to $BigCloudProvider, afterwards we'll have plenty of compute", which ultimately never happened before I left. So I've seen first-hand how ffrupid (double "f" is intentional) otherwise smart companies with large budgets can get with compute for devs, even as this wasn't just destroying morale and preventing the team from adopting some good practices, it was actively slowing down project work, delaying both scheduled releases and emergency fixes. |