| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zzzcpan 2934 days ago
	So, how do you choose that service level objective? How do you know which solutions to implement to not make things "overly reliable"? Isn't that more important question? As doing this without some sort of methodology will almost always result in useless solutions and overpaying to cloud and other hosting providers. Like implementing rather expensive failover within the datacenter, while ignoring how unreliable datacenters are and how cheaply you can implement failover between datacenters via DNS. I like the idea of modelling availability/reliability for this. Even if you don't have the right numbers and do it on a napkin, not in code, it still can highlight solutions with best cost/benefit ratios.

1 comments

gcardone_ 2934 days ago

Disclaimer: I am an SRE at Google, opinions are my own.

There's an excellent talk by Google VP of SRE Ben Treynor: https://www.youtube.com/watch?v=iF9NoqYBb4U. tl;dw: try to measure actual user experience, and make sure that even the long tile of customer still gets a good product experience. What "good product experience" means depends, on your product.

The rest of the error budget is for you to spend on releasing new features, changing the underlying architecture, etc.

link