| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by raffraffraff 813 days ago

"Far too many companies (and DevOps teams) think the www site is "not important" or "not their core job" and outsource it to either a less qualified team, or out of the company altogether"

It's impossible to know because they won't admit it publicly. You are guessing based on some anecdotal experience.

But then again... here's mine! I worked at a very successful SaaS that had (really not kidding) the most incompetent, lazy dope running the www site. He live-edited a "staging" version of the site on the fly (no, it wasn't private, you could access this thing from the internet, and he didn't know or care about that). When he was happy with his changes he'd destroy the live instances behind the load balancer and clone his staging instance without taking it down or running any extra checks. This staging instance was around for years and I don't think he ever bothered doing a system update. Since he didn't use git, I I'll bet that at least once he cloned a live instance back to staging to undo a bunch of bork.

I lost count of the incidents. He never detected them himself, was never available to troubleshoot them and was generally a big "durrrr" when you'd finally get him on the call. Example: one time we had a "slow, intermittent errors" customer support ticket surfaced to us, not because it was our job, but because dopey was being an absolute ass to the helpdesk guys. He ran his crap in another AWS account we didn't have access to. About a day later the www site went down completely, so we got hold of the AWS account and dug in. All 5 of the instances behind the load balance were "unhealthy" for various reasons. Certs expired, disks full, apache stopped. We bounced them, restarted them and sshed in. They all had different versions of the site. It was a complete mess. Turns out dopey wasn't very good at killing the old instances and cloning staging. He was probably live-editing the instances for smaller changes if that seemed easier than a bunch of AWS console work.

Unbelievably he wasn't fired and continued to mismanage the site, and we could do nothing because the head of marketing didn't listen to the head of engineering. They hated each other. The way Marketing saw it "your SRE guys couldn't fix it, they had to wait for <dopey> to get on the call". I'm not even kidding.

Just more anecdotal evidence from me. You might be right.