Hacker News new | ask | show | jobs
by toast0 313 days ago
I mean, I did a speed test with t-mobile 5g home internet, download speed was impressive, but so was the difference in ping time during the download vs otherwise.

Sure, wireless is complex, but there were definitely some way too big buffers in the path. Add in some difficulty integrating their box into my network, and it wasn't for me.

2 comments

Fair enough, I concede with your assessment, my understanding of bufferbloat (which I have to relearn everytime I look at it) is that the telltale sign is ping to any destination that traverses the uplink exhibits higher latency than usual when you're saturating your download. It's just a tricky thing to test given variability of conditions (and what might be deemed as expected operation) which is why I'm usually hesitant and sceptical, and I don't trust those speedtest websites to gauge it properly.
Every speed test I tried that measures latency under load shows a large difference between fq_codel on and off.
This is very much a problem ISPs have to deal with as big pipes feed small pipes.
What do ISPs have to actually determine these issues outside of sketchy speedtest websites and vague reports or concerns from customers? What about placing probes in the correct places (e.g. in conditions where there is no additional loss or introduced latency between the end user and uplink). Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.

There's a lot of issues and variables at play; this isn't a case of "it's always DNS". What tools do ISPs even have at their disposal and how accurate are they and does it uncover the actual problem users are experiencing? This is the real issue that ISPs of all size have to deal with.

You don’t need a speed test website to see this problem. Just run a ping on your own while doing a big download that saturates your connection and bufferbloat will happen unless there is some active queue management to prevent the ping packets from waiting in the queue behind the download packets. This happens anytime that there is a fast/slow transition in the internet and the slow connection cannot keep up. To prevent packet loss, the packets will be buffered, which works well for short spikes, but prolonged activity will result in a noticeable backlog and if buffers are allowed to be sufficiently big, you can get arbitrarily long delays, which are visible in ping times.

The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.

In most western countries, the bufferbloat typically occurs at people’s home internet connections. As is the case in all cases of buffer bloat, the solution is to be willing to drop packets when the connection is saturated. If you limit the bandwidth just below what the connection can handle, you can do active queue management to solve the problem.

That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply to continue your crusade. :/

You are 100% correct.

It is not yet a "solved" problem, but 10-15 years have started to make a dent and get better tools to both observe and act on the problem.

This is seen everywhere from the inclusion of CAKE ( https://man7.org/linux/man-pages/man8/tc-cake.8.html ) in some CPE / home router, but the use of fq_codel ( https://man7.org/linux/man-pages/man8/tc-fq_codel.8.html ) in routers along the way.

Other ISPs have to go even farther, because "content" might be 80-120ms away, and the ability to be more aggressive or less aggressive in tuning certain parameters can have a large impact on overall customer Quality of Experience. If there are any LEO hops along the way, problems with TCP and delayed signaling as a byproduct can also make throughout tank while latency spikes.

DPDK and VPP have contributed to a lot of new networking devices to help observe and act on traffic.

Everytime you go from a big pipe to a small pipe (higher data rate to lower data rate) connection you will see this issue at varying levels.

Do you have links to information on what DPDK and VPP are doing in the area of bufferbloat? I have not kept up with them since I cannot use them in my day to day life, but I would love to update myself on the subject.

By the way, when I wrote in another comment that bufferbloat was solved, I meant it in the same way that IPv4 exhaustion is solved by IPv6. We have the ability to deploy solutions that largely fix things, but whether we do is another matter. You are right to say that the past 10-15 years have started to make a dent in the problem. I had not meant to suggest otherwise.

Thanks for the reply and confirming what I had already said earlier in regards to detecting telltale signs of bufferbloat. In case you were aware, a controlled experiment to exhibit bufferbloat doesn't translate to users being materially affected.

The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.

Out of curiosity, did you have full observability of these peering links, or is this a hypothesis? I could think of a few scenarios where alternative explanations could explain what you're seeing.

In most western countries, the bufferbloat typically occurs at people’s home internet connections.

Says who? How is this measured? Do we have actual numbers on people experiencing real bufferbloat issues that are affecting their service?

That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply out of ignorance. :/

Look, I will cordially suggest a more tenable approach: consider disengaging from this thread, your vacuous and vapid post hasn't really brought anything to the table.

Edit: Seems I can't reply to the child comment, so I'll just say, you should've used your own advice and not reply. There's nothing of substance and you're still continuing with your daft misinterpretation of my take. I'll leave it at that.

Overanalysis for the sake of denying the existence of whatever you want is cliche. It does not matter how complete the information on a subject is, since you will just post more pointless questions, whose relevance is specious, for the sake of claiming there are non-existent issues in understanding. The last time I saw this used involved a very loquacious guy who denied Darwin’s theory of evolution. It can also be used to claim the world is flat.

I was being generous by advising you to stop posting, since the more you post asinine things, the worse you look. In the past, I have taken the liberty to do amateur psychoanalysis of people who post bizarre things online based on a psychology class I took in college. If I keep responding, it will only be to get you to post more so that I can work out what is wrong with you for my own curiosity. I am probably not the only one thinking this.

> Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.

The actual problem is I'm on a voip call and someone starts a big download (steam) and latency and jitter go to hell and the call is unusable. Bufferbloat test confirms that latency dramatically increases under load. Or same call but someone starts uploading something big.

If troublesome buffers are at the last mile connection and the ISP provides a modem/router, adding QoS limiting downloads and uploads to about 90% of the acheived physical connection will avoid the issue. The buffers are still too big, but they won't fill under normal conditions, so it's not a problem. You could still fill the buffers if there's a big flow that doesn't use effective congestion control, or a large enough number of flows so that the minimum send rate is still too much; or when the physical connection rate changes, but good enough. Many ISPs do this, and so you hear a lot less complaining about bufferbloat on say Comcast these days; also, this is an effective best practice, so less need for papers, reports and case studies... it's a matter of getting the practices in the wild and maybe figuring out how to do it better for wireless systems with rapidly changing rates.

Otherwise, ISP visibility can be limited. Not all equipment will report on buffer use, and even if it does, it may not report on a per port basis, and even then, the timing of measurement might miss things. What you're looking for is a 'standing buffer' where a port always has at least N packets waiting and the buffer does not drain for a meaningful amount of time. Ideally, you'd actually measure the buffer length in milliseconds, rather than packets, but that's asking a lot of the equipment.

There's a balance to be met as well. Smaller buffers mean packet drops, which is appropriate when dealing with standing buffers; but too small of buffers leads to problems if your flows are prone to 'micro bursts', lots of packets at once potentially on many flows, and then calm for a while. It's better to have room to buffer those.

Rate limiting the CPE doesn't seem to really impact the buffer queue depth on the 100G upstream switch feeding the 1G customer port. In addition, sticking them on something like 90% customer speed plan or 90% port speed also doesn't help, and in fact with many customers, they are now pissed because they never hit their plan speeds in a speed test.

Something I have always done I actually provision to account for packet overhead, so you might speed 2-3% higher speeds than your plan limit in a speed test, but psychologically the customer is getting more than they paid for, and most seem to be very happy about that.

But, rate limits were already in place long before anything about queue depth was even discussed, so that was nothing new. CAKE OTOH has had a very noticable impact on the customer experience, when their kids XBox can download that 250G update without impacting the voip call or wifi offloading another member of the household is on. Alternatively, that same gamer can play while Mom is downloading something near max throughput without having latency spikes and packet loss.

Yes, you're on to something about the customer experience in general that I'm tracking down myself. Orb is also trying to get a look, but I'm not a fan so far of that tool/platform https://orb.net/

codel/CAKE also came from that project, no middlebox needed.