|
|
|
|
|
by lionkor
799 days ago
|
|
I'm not sure what 100% uptime is supposed to mean. Zero downtime, thats like saying zero bugs - it's not going to happen. Even 99.9999% uptime would be realistic, but 100% is not happening. There are factors out of your control that you cant guarantee will not even be down 1ms per year. How do you manage 100% uptime when your db is only going to be 99.9%? The latter is almost easy, the former, even if we assume 99.99999%, really really hard. Just silly and may show that there are some unrealistic assumptions, maybe? Or maybe the team doesn't know much about reliability -- also concerning. Just target six nines and be done with it, or something. In may of last year, NPM was down to 99.8 for the month. |
|
If we were to provide an SLA (an agreement, stating the minimum level of service to a customer) for this service, it would not be 100%. It would be 99.99%. This is to avoid risk. But we can still have a higher internal target than the provided SLA.
If we have to make all changes in a way that requires that we do not even have 8 seconds of downtime a year (but 0 seconds of downtime), that significantly changes how you design a system and roll out changes.
TLDR: SLA != SLO