Hacker News new | ask | show | jobs
by acje 837 days ago
Yes, that is a good one. I guess a TLDR for the FCC would be; a) No do not bother because most user will not be able to comprehend meaningful latency metrics. b) Report best case latency under idle load to nearest IXP because this would compose well with bandwidth properties when reasoning about the suitability of the service for different workloads.

Of course there be dragons. Oversubscription in various parts of the network can have interesting behaviour. Especially when coordinated user behaviour or complicated packet processing is at play. Think encapsulation or deep packet inspection. My worst case experience includes Cisco Nexus M1 32x10Gbps line cards maxing out at 1-2 Gbps throughput and 1+ sec worst case latency because of OTP. This is a datacenter core switch. And F5 WAF eating random packets because something looks like a VISA card number, causing a retransmit, that again shows up as high latency at higher levels.

1 comments

Oh goodness no, always measure latency under full load, in the middle of the bandwidth test. A convenient example is https://www.waveform.com/tools/bufferbloat which can easily tell crappy ISPs from good ones
Measuring latency under full load is like measuring how fast you can drive into a crossing and not making the turn. It is meaningless. See the fantastic explanation by Gil Tene in the link above.

With best case latency you can determine if the service can be suitable for real-time or not. There will always be buffer effects and they will vary by other user activity and the complexity or the packet processing. These effects will largely be unknowable in advance and the part that is knowable is extremely difficult to communicate to an average user. They don’t really get HDR histograms https://github.com/HdrHistogram/HdrHistogram.NET/blob/master...

I have tried to point out MANY times that the DEFAULT behavior of the TCP slow start algorithm is to saturate the link - however briefly - get a drop, and then back off. I try to do it with some humor using jugglers here - and get so far as slow start, I think, about 10 minutes in: https://www.youtube.com/watch?v=TWViGcBlnm0&t=510s

ALL NETWORKS have perceptible jitter due to this, unless you rigorously apply FQ and AQM techniques to each slower hop on the path

Fatter tests like waveform show the common bufferbloat scenario in ways humans can see, better. But slow start overshoot is always there, on any connection that lasts long enough, which only takes a couple RTTs. You can clearly see netflix doing you in here, for example...

https://www.youtube.com/@trendaltoews7143

(except that in this case, the link has libreqos and FQ on it, so all that buffering is just local to the netflix flow, invisible to everything else)

Nice videos Dave. I guess I have personally given up on effective buffer management. Perhaps if ipv6 and infiniband becomes the underlying infrastructure? There is just so many layers of abstraction hiding no longer useful decisions in the stack that I have just decided to leave infra and networking behind for a while to see if one can make a difference elsewhere.
Don´t give up. Ask for RFC7567 everywhere.
Sorry, I think you are thinking of something else. Maybe a railroad crossing (:-))

Joking aside, the https://www.waveform.com/tools/bufferbloat test looks to see if the networking software is working correctly by putting a large load on the network, and then seeing if other streams are affectec by the overload.

The example on the https://libreqos.io/ home page is of * good software delivering 9 and 23 milliseconds down/up latency at full load * bad software delivering 106 and 517 milleseconds latency under load.

It is, in effect, a test for software failure under load

Yeah that sounds like looking for fairness and effectiveness in buffer management in a shared environment. I was thinking of measuring latency of x.

The M1 cards I mentioned was notorious with 1GB (?) buffer and a massively oversubscribed packet processor. Causing extreme jitter.