Hacker News new | ask | show | jobs
by freerobby 4517 days ago
Thanks.

Does anybody else here agree with this mentality? This seems a major mispractice to me. I've worked at companies with as few as two people to as many as 50,000 people. None of them have had production systems that are entirely self-maintaining. Most startups are better off being pragmatic than investing man-years of time handling rare error cases like what to do if you get an S3 upload error while nearly out of disk space. There's a good reason why even highly automated companies like Facebook have dozens of sysadmins working around the clock.

I thought all of his other points were spot-on but this one rings very dissonant to my experience.

3 comments

I agree - it seems off to me. Sometimes you really want to diagnose your problems manually.

I'm also wondering how command and control is maintained without SSH access. Is there some kind of autoupdating service polling a master configuration management server (i.e., puppet's puppetmaster)?

I can appreciate that ensuring that a typical deploy doesn't require hand-twiddling. That makes sense, lots of it. But not disabling SSH.

I think the problem is that I've made it seem like a strict rule in the article; "You must disable SSH or everything will go wrong!!!". It's really just about quickly highlighting what needs automating. Like you say, sometimes you just want to diagnose your problems manually, and that's fine, re-enable SSH and dive in. But if you're constantly fixing issues by SSHing in and running commands, that's probably something you can try to automate.

Personally I always had a bad habit of cheating with my automation. I would automate as much as I could, and then just SSH in to fix one little thing now and then. I disabled SSH to force myself to stop cheating, and it worked well for me, so I wanted to share the idea.

Of course, there's always going to be cases where it's simply not feasible to disable it completely. It depends on your application. The ones I've worked on aren't massively complex, so the automation is simpler. I can certainly see how not having SSH access for larger complex systems could become more of a hindrance.

Facebook has dozens of sysadmins working 24x7, but they also have 200,000 servers.

While diagnosing issues may require logins, you aren't going to be fix anything on 200,000 servers without automation. At large scale, all problems becomes programming problems.

It also means the death of jobs for sysadmins that only know how to go in tweak *.conf files and reboot servers. So, I guess that sucks for you.

Agreed, but two important notes:

1) Automation doesn't have to be complete automation. I can use Chef tools like knife-ssh to run a single command on every one of our boxes in near-real time. This might not be automated to the extent that the OP is referring but it's 99.8% more efficient than logging into 500 boxes to do it (or 200,000 in Facebook's case). If you can get 99.8% with ease, it may not be worth pre-automating for that final 0.2%.

2) Knowing what is worth automating comes from experience. If I have to do something nasty once, it might be a total waste of time to automate it. There are lots of one-off things I type at a command prompt that are quicker to run than to automate. I think being so dogmatic about automation as to say you should never run things at a command line requires you to spend peple's time non-optimally.

No there is no reason to give up on such a tool; the only good reason to support such idea is to get some sort of attention at your person by claiming something crazy