Hacker News new | ask | show | jobs
by ttul 886 days ago
Incompetence. Take my friend’s company for instance. They were frustrated paying $60K/mo to Amazon so their brilliant sysadmin bought $600K of servers and moved them into a cheap colo.

Over Christmas, everything died, and the brilliant sysadmin was on holiday. Nobody could get things going again for many days and so their entire SaaS business was failing. They lost a lot of business and trust as a result.

The sysadmin is now gone and they are back on AWS.

3 comments

No key person risk management -> no risk register -> no management. Your friends company will fail regardless of poor sysadmin decision making or not. They need to hire competent management ASAP.
This is basically the logic of people who say the cloud is too expensive, you have to ignore so many things to make being on premise logical. Basically you are lying to yourself if you think you can run a datacenter cheaper and better than Amazon or Microsoft can, because if you can you are just making huge sacrifices somewhere (usually time, which is why reddit sysadmins complain about how much work they have while defending being on-premise because they couldn't possibly be wrong).
you must be management, cuz

1. you think it's the sysadmin's fault

2. there are no competent sysadmins out there

>Basically you are lying to yourself if you think you can run a datacenter cheaper and better than Amazon or Microsoft

what magical things do they have? that every single reasonably sized enterprise doesn't have? it should be extremly easy for a small enterprise to beat any of the main clouds* - they make crap ton of profit from you

*making an assumption that your needs are reasonably static and not MASSIVELY busting up and down your infrastructure

The "magical" thing they have is thousands and thousands of people thinking about how to improve the performance/efficiency/availability of their datacenters.

And yes, they pay the costs of those people and take a good profit margin, and yes there are in some ways diminishing returns to go from 3 nines to 6. But most enterprises can't match that depth of concentrated expertise, certainly not most small enterprises.

just to confirm I when i say small(ish) enterprise - i'm referring to company with around 500+ people and a IT dept of over 20+ people

seriously??!? you think you need thousands of people to to improve your effeciency and performance I would strongly suggest employing a few good infrastructure engineers/ architects who know what they're talking about - there really is no secret sauce!! just lots of kool aid on cloud

the whole cloud thing only looks wonderful and magical if your inexperienced.

re: the whole up time x9 thing is fairly useless in the real world, since the architecture of the application is really the king here!! christ on a stick i got an application running 100% for 3+ years on NT 4 because of good design (the clue is active - active - active)

also to add... availibilty zones are a very very poor mans DR

With cloud and SaaS services you are paying to reduce person risk profile.

Your forming a larger dependency on a team lead against a custom system that now is a liability as new people come to the organization don't want to adopt an abandoned poorly understood project.

This company is reasonably well run. After going back to AWS, they doubled their revenue and things are going well. They are not incompetent. They did earnestly try to cut their costs and just didn’t see the iceberg.
Faint ISO 27001 sounds in the background
> brilliant sysadmin was on holiday

> entire SaaS business

> [ Unmentioned - Single Point of Failure Service dependent on a single admin ]

If you are fully accounting for vacation, training, sleep etc then you need a minimum of 5 admins for mission critical services. Now, you can engineer around this to reduce your staffing requirement but I wouldn't recommend going under 2 ever because accidents happen.

This business seemed one below that, without the engineering, and I would point to the mgmt, not the brilliant admin as the problem.

> The sysadmin is now gone and they are back on AWS.

This story has nothing to do with AWS or on-prem.

It's a story about incompetent management allowing a single human point of failure. If they don't change that, they'll have the same problem wherever they go.