Hacker News new | ask | show | jobs
by cletus 2253 days ago
So in a former life I worked on Google Fiber and, among other things, wrote a pure JS Speedtest (before Ookla had one alhtough there's might've been in beta by then). It's still there (http://speed.googlefiber.net). This was necessary because Google Fiber installers use Chromebooks to verify installations and Chromebooks don't support Flash.

This is a surprisingly difficult problem, especially given the constraints of using pure JS. Some issues that spring to mind included:

- The User-Agent is meaningless on iPhones, basically because Steve Jobs got sick of leaking new models in Apache logs. There are other ways of figuring this out but it's a huge pain.

- Send too much traffic and you can crash the browser, particularly on mobile devices;

- To maximize throughput it became necessary to use a range of ports and simultaneously communicate on all of them. This in turn could be an issue with firewalls;

- Run the test too long and performance in many cases would start to degrade;

- Send too much traffic and you could understate the connection speed;

- Sending larger blobs tended to be better for measuring throughput but too large could degrade performance or crash the browser. Of course, what "too large" was varied by device;

- HTTPS was abysmal for raw throughput on all but the beefiest of computers;

- To get the best results you needed to turn off a bunch of stuff like Nagel's algorithm and any implicit gzip compression;

- You'd have to send random data to avoid caching even with careful HTTP headers that should've disabled caching.

And so on.

Perhaps the most vexing issue that I was never able to pin down was with Chrome on Linux. In certain circumstances (and I never figured out what exactly they were other than high throughput), Chrome on Linux would write the blobs it downloaded to /tmp (default behaviour) and never release them until you refreshed the Webpage. And no there were no dangling references. The only clue this was happening was that Chrome would start spitting weird error messages to the console and those errors couldn't be trapped.

So pure JS could actually do a lot and I actually spent a fair amount of effort to get this to accurately show speeds up to 10G (I got up to 8.5G down and ~7G up on Chrome on a MBP).

But getting back to the article at hand, what you tend to find is how terribly TCP does with latency. A small increase in latency would have a devastating effect on reported speeds.

Anyone from Australia should be intimately familiar with this as it's clear (at least to me) that many if not most services are never tested on or designed for high-latency networks. 300ms RTT vs <80ms can be the difference between a relatively snappy SPA and something that is utterly unusable due to serial loads and excessive round trips.

So looking at this article, the first thing I searched for was the word "latency" and I didn't find it. Now sure the idea of a CDN like Cloudfare is to have a POP close to most customers but that just isn't always possible. Plus you hit things not in the CDN. Even DNS latency matters here where pople have shown meaningful improvements in Web performance by just having a hot cache of likely DNS lookups.

The degradation in throughput in TCP that comes from latency is well-known academically. It just doesn't seem to be known about, given attention to or otherwise catered for in user-facing services. Will HTTP/3 help with this? I have no idea. But I'd like to know before someone dismisses it as having minimal improvements or, worse, as degrading performance.

2 comments

> - Send too much traffic and you can crash the browser, particularly on mobile devices;

Surprised to hear that. Sending data should never lead to a crash. Even an aborted request wouldn't be great. When was that? Hope these things got fixed.

They did mention multiple geographic locations as well as RTT (Round Trip Time) which is somewhat equivalent to latency, no?
The challenge to control for is that they used WebPageTest, which tends to have locations in data centers near where they do. Using the traffic shaping options can add latency but what you really want is random latency and packet loss to simulate real-world usage.