Hacker News new | ask | show | jobs
by commandersaki 313 days ago
Is it though, or is it just a scapegoat or a red herring, especially in the case with a wireless medium? That's been my experience with quick claims to bufferbloat, it's usually something else at play. But again ymmv.
2 comments

There's always something else at play. Bufferbloat hides problems from the systems that can easily solve them. It doesn't cause problems, it makes them worse.
I mean, I did a speed test with t-mobile 5g home internet, download speed was impressive, but so was the difference in ping time during the download vs otherwise.

Sure, wireless is complex, but there were definitely some way too big buffers in the path. Add in some difficulty integrating their box into my network, and it wasn't for me.

Fair enough, I concede with your assessment, my understanding of bufferbloat (which I have to relearn everytime I look at it) is that the telltale sign is ping to any destination that traverses the uplink exhibits higher latency than usual when you're saturating your download. It's just a tricky thing to test given variability of conditions (and what might be deemed as expected operation) which is why I'm usually hesitant and sceptical, and I don't trust those speedtest websites to gauge it properly.
Every speed test I tried that measures latency under load shows a large difference between fq_codel on and off.
This is very much a problem ISPs have to deal with as big pipes feed small pipes.
What do ISPs have to actually determine these issues outside of sketchy speedtest websites and vague reports or concerns from customers? What about placing probes in the correct places (e.g. in conditions where there is no additional loss or introduced latency between the end user and uplink). Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.

There's a lot of issues and variables at play; this isn't a case of "it's always DNS". What tools do ISPs even have at their disposal and how accurate are they and does it uncover the actual problem users are experiencing? This is the real issue that ISPs of all size have to deal with.

You don’t need a speed test website to see this problem. Just run a ping on your own while doing a big download that saturates your connection and bufferbloat will happen unless there is some active queue management to prevent the ping packets from waiting in the queue behind the download packets. This happens anytime that there is a fast/slow transition in the internet and the slow connection cannot keep up. To prevent packet loss, the packets will be buffered, which works well for short spikes, but prolonged activity will result in a noticeable backlog and if buffers are allowed to be sufficiently big, you can get arbitrarily long delays, which are visible in ping times.

The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.

In most western countries, the bufferbloat typically occurs at people’s home internet connections. As is the case in all cases of buffer bloat, the solution is to be willing to drop packets when the connection is saturated. If you limit the bandwidth just below what the connection can handle, you can do active queue management to solve the problem.

That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply to continue your crusade. :/

You are 100% correct.

It is not yet a "solved" problem, but 10-15 years have started to make a dent and get better tools to both observe and act on the problem.

This is seen everywhere from the inclusion of CAKE ( https://man7.org/linux/man-pages/man8/tc-cake.8.html ) in some CPE / home router, but the use of fq_codel ( https://man7.org/linux/man-pages/man8/tc-fq_codel.8.html ) in routers along the way.

Other ISPs have to go even farther, because "content" might be 80-120ms away, and the ability to be more aggressive or less aggressive in tuning certain parameters can have a large impact on overall customer Quality of Experience. If there are any LEO hops along the way, problems with TCP and delayed signaling as a byproduct can also make throughout tank while latency spikes.

DPDK and VPP have contributed to a lot of new networking devices to help observe and act on traffic.

Everytime you go from a big pipe to a small pipe (higher data rate to lower data rate) connection you will see this issue at varying levels.

Thanks for the reply and confirming what I had already said earlier in regards to detecting telltale signs of bufferbloat. In case you were aware, a controlled experiment to exhibit bufferbloat doesn't translate to users being materially affected.

The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.

Out of curiosity, did you have full observability of these peering links, or is this a hypothesis? I could think of a few scenarios where alternative explanations could explain what you're seeing.

In most western countries, the bufferbloat typically occurs at people’s home internet connections.

Says who? How is this measured? Do we have actual numbers on people experiencing real bufferbloat issues that are affecting their service?

That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply out of ignorance. :/

Look, I will cordially suggest a more tenable approach: consider disengaging from this thread, your vacuous and vapid post hasn't really brought anything to the table.

Edit: Seems I can't reply to the child comment, so I'll just say, you should've used your own advice and not reply. There's nothing of substance and you're still continuing with your daft misinterpretation of my take. I'll leave it at that.

> Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.

The actual problem is I'm on a voip call and someone starts a big download (steam) and latency and jitter go to hell and the call is unusable. Bufferbloat test confirms that latency dramatically increases under load. Or same call but someone starts uploading something big.

If troublesome buffers are at the last mile connection and the ISP provides a modem/router, adding QoS limiting downloads and uploads to about 90% of the acheived physical connection will avoid the issue. The buffers are still too big, but they won't fill under normal conditions, so it's not a problem. You could still fill the buffers if there's a big flow that doesn't use effective congestion control, or a large enough number of flows so that the minimum send rate is still too much; or when the physical connection rate changes, but good enough. Many ISPs do this, and so you hear a lot less complaining about bufferbloat on say Comcast these days; also, this is an effective best practice, so less need for papers, reports and case studies... it's a matter of getting the practices in the wild and maybe figuring out how to do it better for wireless systems with rapidly changing rates.

Otherwise, ISP visibility can be limited. Not all equipment will report on buffer use, and even if it does, it may not report on a per port basis, and even then, the timing of measurement might miss things. What you're looking for is a 'standing buffer' where a port always has at least N packets waiting and the buffer does not drain for a meaningful amount of time. Ideally, you'd actually measure the buffer length in milliseconds, rather than packets, but that's asking a lot of the equipment.

There's a balance to be met as well. Smaller buffers mean packet drops, which is appropriate when dealing with standing buffers; but too small of buffers leads to problems if your flows are prone to 'micro bursts', lots of packets at once potentially on many flows, and then calm for a while. It's better to have room to buffer those.

Rate limiting the CPE doesn't seem to really impact the buffer queue depth on the 100G upstream switch feeding the 1G customer port. In addition, sticking them on something like 90% customer speed plan or 90% port speed also doesn't help, and in fact with many customers, they are now pissed because they never hit their plan speeds in a speed test.

Something I have always done I actually provision to account for packet overhead, so you might speed 2-3% higher speeds than your plan limit in a speed test, but psychologically the customer is getting more than they paid for, and most seem to be very happy about that.

But, rate limits were already in place long before anything about queue depth was even discussed, so that was nothing new. CAKE OTOH has had a very noticable impact on the customer experience, when their kids XBox can download that 250G update without impacting the voip call or wifi offloading another member of the household is on. Alternatively, that same gamer can play while Mom is downloading something near max throughput without having latency spikes and packet loss.

Yes, you're on to something about the customer experience in general that I'm tracking down myself. Orb is also trying to get a look, but I'm not a fan so far of that tool/platform https://orb.net/

codel/CAKE also came from that project, no middlebox needed.