Hacker News new | ask | show | jobs
by erikb 2896 days ago
So there is one obscure metric "service is available, i.e. can do its job", and this metric has different attributes: there are actual metric values (SLIs), there are internal goals (SLOs) and there are legally binding promises (SLAs) to users/customers. I would argue that this is not much content here.

Content, imo, would be something like this: We define "available" as "processor_load<99% and disk_load<99% and ram_load<99% and server responds with http 200 on port xyz", because reason_a, reason_b, reason_c. But other people could argue that it is not as much about the node but about how service_x is experienced, so one could track the speed of http responses to user requests and they should be under 0.1sec over 95% of the time. etc...

That you should track metrics, that you should set goals, and that you should define SLAs with your customers/users is standard business practice, not new knowledge.