What do ISPs have to actually determine these issues outside of sketchy speedtest websites and vague reports or concerns from customers? What about placing probes in the correct places (e.g. in conditions where there is no additional loss or introduced latency between the end user and uplink). Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.
There's a lot of issues and variables at play; this isn't a case of "it's always DNS". What tools do ISPs even have at their disposal and how accurate are they and does it uncover the actual problem users are experiencing? This is the real issue that ISPs of all size have to deal with.
You don’t need a speed test website to see this problem. Just run a ping on your own while doing a big download that saturates your connection and bufferbloat will happen unless there is some active queue management to prevent the ping packets from waiting in the queue behind the download packets. This happens anytime that there is a fast/slow transition in the internet and the slow connection cannot keep up. To prevent packet loss, the packets will be buffered, which works well for short spikes, but prolonged activity will result in a noticeable backlog and if buffers are allowed to be sufficiently big, you can get arbitrarily long delays, which are visible in ping times.
The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.
In most western countries, the bufferbloat typically occurs at people’s home internet connections. As is the case in all cases of buffer bloat, the solution is to be willing to drop packets when the connection is saturated. If you limit the bandwidth just below what the connection can handle, you can do active queue management to solve the problem.
That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply to continue your crusade. :/
Other ISPs have to go even farther, because "content" might be 80-120ms away, and the ability to be more aggressive or less aggressive in tuning certain parameters can have a large impact on overall customer Quality of Experience. If there are any LEO hops along the way, problems with TCP and delayed signaling as a byproduct can also make throughout tank while latency spikes.
DPDK and VPP have contributed to a lot of new networking devices to help observe and act on traffic.
Everytime you go from a big pipe to a small pipe (higher data rate to lower data rate) connection you will see this issue at varying levels.
Do you have links to information on what DPDK and VPP are doing in the area of bufferbloat? I have not kept up with them since I cannot use them in my day to day life, but I would love to update myself on the subject.
By the way, when I wrote in another comment that bufferbloat was solved, I meant it in the same way that IPv4 exhaustion is solved by IPv6. We have the ability to deploy solutions that largely fix things, but whether we do is another matter. You are right to say that the past 10-15 years have started to make a dent in the problem. I had not meant to suggest otherwise.
Thanks for the reply and confirming what I had already said earlier in regards to detecting telltale signs of bufferbloat. In case you were aware, a controlled experiment to exhibit bufferbloat doesn't translate to users being materially affected.
The worst that I have ever seen was about 30 seconds when visiting a foreign country where bufferbloat was occurring at peering links. The bufferbloat in peering links is likely visible from western countries if you ping residential IPs in developing countries and monitor the ping times over days. Some parts of the day will have very high ping times while others will not. The high ping times will be the buffer bloat.
Out of curiosity, did you have full observability of these peering links, or is this a hypothesis? I could think of a few scenarios where alternative explanations could explain what you're seeing.
In most western countries, the bufferbloat typically occurs at people’s home internet connections.
Says who? How is this measured? Do we have actual numbers on people experiencing real bufferbloat issues that are affecting their service?
That said, I suggest you stop posting replies. Your crusade against the idea of buffer bloat makes you look bad to anyone with enough networking knowledge to understand what bufferbloat is. I also strongly suspect I wrote an explanation that you will take zero time to understand and rather than take my advice, you will post another reply out of ignorance. :/
Look, I will cordially suggest a more tenable approach: consider disengaging from this thread, your vacuous and vapid post hasn't really brought anything to the table.
Edit: Seems I can't reply to the child comment, so I'll just say, you should've used your own advice and not reply. There's nothing of substance and you're still continuing with your daft misinterpretation of my take. I'll leave it at that.
Overanalysis for the sake of denying the existence of whatever you want is cliche. It does not matter how complete the information on a subject is, since you will just post more pointless questions, whose relevance is specious, for the sake of claiming there are non-existent issues in understanding. The last time I saw this used involved a very loquacious guy who denied Darwin’s theory of evolution. It can also be used to claim the world is flat.
I was being generous by advising you to stop posting, since the more you post asinine things, the worse you look. In the past, I have taken the liberty to do amateur psychoanalysis of people who post bizarre things online based on a psychology class I took in college. If I keep responding, it will only be to get you to post more so that I can work out what is wrong with you for my own curiosity. I am probably not the only one thinking this.
I was being generous by advising you to stop posting, since the more you post asinine things, the worse you look. In the past, I have taken the liberty to do amateur psychoanalysis of people who post bizarre things online based on a psychology class I took in college. If I keep responding, it will only be to get you to post more so that I can work out what is wrong with you for my own curiosity. I am probably not the only one thinking this.
Look, let's call this what it is: gatekeeping. Furthermore you deflect and avoid answering a real question. I don't think you actually understood the crux of what I'm saying and instead resorted to ad hominems and gatekeeping, but seeing as it went over your head, I will pose the question: does bufferbloat have more than a marginal affect on the Internet experience of end users in real world conditions (not in a controlled experiment), furthermore does it affect a significant population, as of today in the 2020s as opposed to circa 2010? I'm saying no to both; a good way to gauge whether it is still relevant is to see publications in networking conferences and journals or even discussions by the *NOG, and really it's just not there. I know there's obsession over CoDel etc. and I used to follow the late Dave Taht's evangelising about the issue, but put simply the numbers don't add up - anyways a simpler solution would simply to prioritise ICMP and UDP flows over TCP. Anyways, this is not your imagined crusade against bufferbloat, it's just a pragmatic assessment. I'll leave it at that, rather than deflect and attack, consider applying some emotional intelligence.
The problem of buffer bloat is one of many issues that affect internet users. When I visited China and pings to my VPN in the US jumped from 200ms to 30 seconds depending on the time of day, bufferbloat was severely affecting me. That could only be described as bufferbloat, since the packets were suffering from store and forward overhead to an excessive degree and my pings were able to measure it across times of day.
Historically and likely still in the present day (but not in my household as we use AQM now), whenever one person in a household does a large download, internet latencies shoot up for everyone in the household, which is also bufferbloat. Having to wait hundreds of ms per round trip brings us back to the 56k dialup days and the performance impact on interactive traffic is horrific. It is enough to make VoIP unusable. As others have told you, there can be other issues at the same time, but bufferbloat makes the issues worse. I cannot speak for others on the extent to which they are afflicted by buffer bloat, but adopting AQM had a night and day difference in performance of the internet connection in my house, since I often do big data transfers that previously would slow down basic web browsing for everyone in my house, myself included.
As for your conjecture that extant problems are visible in recent journal publications, journals have a selection bias. The idea that a problem’s existence is indicated by the degree to which people are publishing papers on it in journals is fallacious since the papers need to not just provide something new, but also be interesting to those running the journal (i.e. make them think that the papers would elevate the status if their journal and help them get increased readership, provided that they are not a junk journal that will publish literally anything). On top of that, the work needs to be funded. Bufferbloat, which is largely considered a solved problem and which predominantly affects the less affluent these days, is not something that will get much attention in journals since nobody in academia seeks funding for something that they do not think they can improve or publish.
Finally, I did not use any ad hominem remarks toward you, as my remarks had focused entirely on what you wrote. I did write that any further replies would likely be done to get you to keep talking so I can play my old game of “figure out what is wrong with someone posting bizarre things on the internet”. About 30% of the population is mentally ill and thus when someone is posting bizarre things online, it is often the result of mental illness. Figuring out which mental illness is often the only reason responding to bizarre posts is worthwhile (as it is both an intellectual challenge and a public service). This contradicts your remarks suggesting that there is no point to my replies, to use my words rather than yours. It is not an ad hominem remark to say that I am likely to do this analysis. Posting the results of the analysis would be, but it would be grounded in fact and would likely be done to suggest professional help for X condition, if my amateur analysis identifies a condition that could benefit from professional help. Honestly, I think the world would be a better place if more people who studied psychology (even 1 class like I did) played armchair psychologist when others persist in a pattern of bizarre remarks and refer those who need professional help to trained professionals.
> Also is this an actual problem that users are really having, or is it perceived because some benchmark / speedtest gave you a score.
The actual problem is I'm on a voip call and someone starts a big download (steam) and latency and jitter go to hell and the call is unusable. Bufferbloat test confirms that latency dramatically increases under load. Or same call but someone starts uploading something big.
If troublesome buffers are at the last mile connection and the ISP provides a modem/router, adding QoS limiting downloads and uploads to about 90% of the acheived physical connection will avoid the issue. The buffers are still too big, but they won't fill under normal conditions, so it's not a problem. You could still fill the buffers if there's a big flow that doesn't use effective congestion control, or a large enough number of flows so that the minimum send rate is still too much; or when the physical connection rate changes, but good enough. Many ISPs do this, and so you hear a lot less complaining about bufferbloat on say Comcast these days; also, this is an effective best practice, so less need for papers, reports and case studies... it's a matter of getting the practices in the wild and maybe figuring out how to do it better for wireless systems with rapidly changing rates.
Otherwise, ISP visibility can be limited. Not all equipment will report on buffer use, and even if it does, it may not report on a per port basis, and even then, the timing of measurement might miss things. What you're looking for is a 'standing buffer' where a port always has at least N packets waiting and the buffer does not drain for a meaningful amount of time. Ideally, you'd actually measure the buffer length in milliseconds, rather than packets, but that's asking a lot of the equipment.
There's a balance to be met as well. Smaller buffers mean packet drops, which is appropriate when dealing with standing buffers; but too small of buffers leads to problems if your flows are prone to 'micro bursts', lots of packets at once potentially on many flows, and then calm for a while. It's better to have room to buffer those.
Rate limiting the CPE doesn't seem to really impact the buffer queue depth on the 100G upstream switch feeding the 1G customer port. In addition, sticking them on something like 90% customer speed plan or 90% port speed also doesn't help, and in fact with many customers, they are now pissed because they never hit their plan speeds in a speed test.
Something I have always done I actually provision to account for packet overhead, so you might speed 2-3% higher speeds than your plan limit in a speed test, but psychologically the customer is getting more than they paid for, and most seem to be very happy about that.
But, rate limits were already in place long before anything about queue depth was even discussed, so that was nothing new. CAKE OTOH has had a very noticable impact on the customer experience, when their kids XBox can download that 250G update without impacting the voip call or wifi offloading another member of the household is on. Alternatively, that same gamer can play while Mom is downloading something near max throughput without having latency spikes and packet loss.
Yes, you're on to something about the customer experience in general that I'm tracking down myself. Orb is also trying to get a look, but I'm not a fan so far of that tool/platform https://orb.net/
There's a lot of issues and variables at play; this isn't a case of "it's always DNS". What tools do ISPs even have at their disposal and how accurate are they and does it uncover the actual problem users are experiencing? This is the real issue that ISPs of all size have to deal with.