Hacker News new | ask | show | jobs
by mtlynch 847 days ago
I was surprised to see such miserable measured latency to Fly, but then I saw this note:

>The primary region of our server is Amsterdam, and the fly instances is getting paused after a period of inactivity.

After they configured Fly to run nonstop, it outperformed everyone by 3x. But it seems like they're running the measurement from Fly's infrastructure, which biases the results in Fly's favor.

Also weird that they report p75, p90, p95, p99, but not median.

3 comments

I’m not sure what’s customary in most places, but in my experience you base most things off of avg and p99 and then you can use other percentiles to interpolate the shape of the distribution in the event you need a better model (you usually don’t). Of course I’m sure this sort of thing varies wildly by use case.
I admit that I don't deal a lot with latency measurements, but my memory from working at Google was that we focused on p50, p95, and p99. That was six years ago, and it wasn't my focus there, so it's possible I'm misremembering (especially based on all the responses saying I'm the weirdo for expecting p50).

Looking at Google's SRE book, they use p50, p85, p95, and p99, so it's possible I'm misremembering or that Google uses unusual metrics:

https://sre.google/sre-book/service-level-objectives/#fig_sl...

I wasn’t aware of the Google SRE book. Thanks, I’ve bookmarked it!
> Also weird that they report p75, p90, p95, p99, but not median.

I'm not aware of P50s having ever been a relevant performance metric in latency. The focus of these latency measurements were always the expected value for most customers, and that means P90-ish.

Using median to report latency is very weird. I haven't seen any tech report using median though and curious to see one, can you provide?