| HN Mirror

I wish it was that easy - our teams have their targets for p99 and p995 ratios but they cannot capture the overall user experience. For us it's not just the ratio of failed requests, but closer to a four-tuple of:

  * maximum number of users affected
  * maximum time of unavailability
  * maximum observed latency 
  * highest ratio of failed requests over a sequence of relatively tight measurement windows

Those are demanding constraints, but such is reality when peak trading activity can take place within just a few minutes. If users can not place their trades during those short windows, they will quickly lose confidence and take their business elsewhere.

So yes, request ratio is certainly a good part of the overall SLO but covers only a portion of the spectrum.