Hacker News new | ask | show | jobs
by adamckay 1075 days ago
You have to decide whether the complexity and cost of a fully redundant system is worth it and consider it against what your SLA is, especially if your redundancy increases the risk of something going wrong because of that extra complexity.

From personal experience in B2B web apps, a lot of sales/business MBA type's will say they need 100% uptime, but what they actually mean is it needs to be available whenever their customer's users want to access it, and their users are business users that work 9-5 so there's plenty of scope for the system to be down (either due to genuine outage or maintenance/upgrades).

You've possibly also got the bonus of the people that use the app are different to the people that pay for it, so you've also got some leeway in that your system can blip for a minute and have requests fail (as long as there's no data loss), and that won't get reported up the management chain of the customer, because hitting F5 30 seconds later springs it back into life and so they carry on with their day without bother firing an email off or walking over to their bosses desk to complain the website was broken for a second.

At a previous company we deployed each customer on their own VM in either AWS or Azure, with the app and database deployed. It was pretty rare for a VM to fail, and when it did the cloud provider automatically reprovisioned it on new hardware, so as long as you configure your startup scripts correctly and they work quickly then you might be down for a few minutes. It was incredibly rare for an engineer to have to manually intervene, but because our setup was very simple we could nuke a VM, spin up another one and deploy the software back onto it in and be up and running again in under 30 minutes, which to us was worth the reduced costs.