| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hdjrudni 201 days ago

> though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.

I wonder how true that is. This went down because of a bad update, which is probably like 99.99% of outages. The other 0.01% is cosmic rays causing hardware failures.

My server was up for 3.5 years with no outages because I just didn't touch it. I had to take it offline a couple days ago to move it which made me a little sad. Took a snapshot and moved it to a new droplet, brought it back up as-is and it's running great again.

Anyway, emergencies are less emergy if things go down while you're upgrading and shuffling things around yourself. You expect hiccups if you're the one causing the hiccups. It's when someone else is tinkering on the other side of the country/planet and blows something up that suddenly you have an emergency.

2 comments

kikimora 197 days ago

>My server was up for 3.5 years with no outages because I just didn't touch it.

Problem #1 keeping OS current. Chances are you run an outdated OS with some RCE vulnerabilities.

Problem #2 setup is hard to scale organizationally. How to give access to the server to other people? How to monitor what they do? How to replicate server setup across teams and keep it in sync? So on and so forth.

In an org. something always change, and you have to touch servers as a result.

Nextgrid 201 days ago

I concur. I've seen a lot of companies outside the techbro world where the entire thing runs on a single VPS/dedicated server with a setup that would make any sysadmin squirm. And yet, it just works and makes them money?

Which isn't too surprising - hardware is extremely reliable nowadays. When's the last time your laptop broke? And that laptop lives a much harsher life than server HW in a datacenter. Obviously everyone is going to have their own anecdotes about this, but I think it's fair to say that overall the failure rates are quite low.

You know why their (often awful) setups work and consistently beat the major clouds in terms of uptime? No moving parts for K8s and all the "best practices", and most importantly, there is nobody "fixing" the working setup until it doesn't work. Ironically they are getting better uptime by avoiding all the things that are marketed as improving uptime.