| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vundercind 590 days ago
	My experience has been that the vast majority of systems could tolerate a few minutes offline per month for upgrades. Many could tolerate a couple hours per month. No or negligible actual business harm done, enormous cost savings and higher development velocity from not adding the complexity needed for ultra-high uptime and zero-downtime upgrades. What's vital is being able to roll back a recent update, recover from backups, and deploy from scratch, all quickly. Those are usually (not always) far easier to achieve than ultra-high-uptime architecture (which also needs those things, but makes them all more complicated) and can be simple enough that they can be operated purely over ssh with a handful of ordinary shell commands documented in runbooks, but skipping them is how you cheap out in a bad way and end up with a system down for multiple days, or one that you're afraid to modify or update.