| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kylealden 3655 days ago

Edge platform team employee here. A couple thoughts on methodology, focusing on our scripted test (under "stay productive longer" at https://blogs.windows.com/windowsexperience/2016/06/20/more-...), which is most comparable to Opera's test; nobody seems to challenge our Netflix results:

* We did not enable ad blockers, because we were testing the efficiency of the network stack, browser rendering pipeline, etc.; if you cut off ads, you're effectively skipping half the assignment. This is a valid way to improve the user experience, but not a valid way to measure browser efficiency, especially because it will disproportionately impact some sites (like the news sites that Opera's test focused on) and have no impact on other sites (like Netflix, where Edge demolishes the other browsers). It's basically skipping a lap of the race and then bragging that you finished faster.

* It's also worth noting that the ad blockers are often detected by many news sites and will actually prevent the main page content from loading at all. Not sure if Opera's test accounted for this.

* Our test was designed to mimic real-world behavior: Watching YouTube (foreground and background), shopping on Amazon, browsing the Facebook news feed, searching on Google, opening email in Gmail, and reading Wikipedia. To reduce variability, we used WebDriver to instrument the tests (supported by all four browsers tested) and made sure each task was timed rather than just a loop of consecutive tasks (which could disadvantage or advantage browsers depending on other factors like pageload performance or network conditions, in a way that doesn't reflect user experience, which is likely to linger on a page once it's loaded). We then used the Maxim 34407 power instrumentation built into the Surface Book (which is why we chose the Surface Book for this test) to measure actual instantaneous power consumption at the hardware, sampled once per second and then averaged across the duration of the test. We feel strongly that this is a highly scientific and defensible test setup which mimics typical user behavior and, significantly, measures the same markup and the same duration, on the same hardware, in every browser.

11 comments

mynewtb 3655 days ago

How did you control for the specific ads the browsers were served? Ads are highly dynamic. Unless you (can) include the differences in your methodology you better eliminate that influence.

patcheudor 3655 days ago

Ding, ding, ding and to be fair this is the problem in both methodologies. The only way to accurately perform this test is to spider a bunch of sites, save the contents to a locally hosted HTTPD and ensure all third-party JS calls are resolved locally and test both against the exact same sites as they were spidered at a point in time. You can simply not account for changes which may happen to the markup, ad network beacons, ads, metrics code, or even network routing, all of which could influence the test in one way or the other if not run from an identical static cache in a controlled environment. If I were doing this test I'd then skip the whole scripting automation part and simply add a meta-refresh to every page in the cache to sequentially take the browser through the content, giving each page something like 10 seconds to load and render. Simple, simple, and far more accurate.

csl 3655 days ago

You don't need to go to such complicated lengths. Just perform enough tests (as in, a statistically large enough amount) and a distribution will form. That also captures the variability of real world network effects.

breakingcups 3655 days ago

What about different ads served to different browsers? Someone running, say, Opera will have a different ad profile than a Chrome user even when completely blank cookie-wise.

drdeca 3654 days ago

Maybe have them give the same user agent?

patcheudor 3655 days ago

It's hardly complicated. I've put such tests together in an afternoon. In fact, whatever is added in complexity is gained by the fact fewer tests are necessary. Via this mechanism you can also remove any questions about compression, use of HTTP/2, etc., which could impact the tests based on server-side choices when it comes to serving data to either platform. Equal always equals better.

dogma1138 3655 days ago

But those metrics are important, if servers serve more optimized pages to Edge users for some reason that a freaking important fact to know. This is about real world data and real experiences and how it affects actual real users. You can normalize the tests to the point where there is absolutely zero difference between the browsers, of that I'm sure, but that will not reflect any actual cases that real users experience.

anc84 3654 days ago

That would be a whole different test then. Not about efficiency of the browser itself but about what the browser gets served.

jcoffland 3655 days ago

But this is not repeatable.

dogma1138 3655 days ago

Ads are not that much of a problem ads will even themselves out and if for some reason MSFT Edge users receive less ads or ads that are less resource intensive it's also an important metric.

I don't see anything that would somehow create a bias in favor of a specific browser as far as ad networks goes, if anything the stigma/stereotyping of IE/Edge users would probably mean that ad networks are more incentivized of sending the baity apps towards those browsers.

As for the network part well again that's an important metric if certain browsers perform better at adverse network conditions it's an important factor to know, you do not want to give them the best case scenario every time.

Giving a page a fixed amount of seconds to load is also completely the wrong approach you want to see how browsers behave when they can't load a page properly or when it takes more time than usual, maybe some browsers expend more resources by resubmitting the entire request, maybe some browsers do not parse the DOM tree from scratch when some of the requests stall, maybe some browsers have less resource intensive placeholders for DOM elements, maybe some browsers are better at adjusting the DOM preprocessor for network congestion than others.

So no I can't really see how would your approach would be any better, the approach that MSFT took was quite good, netflix, wikipedia, youtube, facebook etc. with what seems to be realistic user behaviour. What you want to do is to put in test that would produce fair results for fairness sakes that's not how you evaluate anything because it would not yield you any real world data.

NetStrikeForce 3655 days ago

That would be as far from real world as you can get.

What you need here is a big enough sample.

dogma1138 3655 days ago

You run enough tests to even things out and if you still have a bias towards certain browsers then it's a statistically important metric on it's own.

eanzenberg 3655 days ago

You'd probably need to run multiple tests to account for the variance of ads served.

wernerb 3655 days ago

The way I read the blog is that the adblocking is enabled to show that even _with_ an expensive extension enabled it still beats Edge, but that was in my opinion not the gist of the post.

The post was about that microsoft is not transparent about the methodology/setup/scripts/target websites used. A third party should be able to support Microsoft's claims.

Is it possible for the setup to be published to your github.com/microsoft so that it may be executed automatically? Heck, go the extra mile put it in a CI and publish the data on regular basis :)

IshKebab 3655 days ago

Adblock isn't an "expensive extension". Ads are the expensive thing, so blocking them saves power.

oxymoron 3655 days ago

It's not quite that simple [0][1], or at least it wasn't always. AdBlock Plus in particular used to rely on a massive style sheet that was injected into all tabs and frames, which degraded really quickly on pages with a lot of iframes (which, as it happens, is typically pages with lots of ads).

I believe Firefox partly resolved the problem on their end later on, although I can't tell you the exact status of things. Nevertheless, there's definitely some precedence for claiming that ad blocking is an expensive operation. Intuitively, you'd think that network based blocking would be enough, but it won't work against same domain ad sources (a typical example being facebook ads), while css selectors are able to capture a bit more depth. Nevertheless, I don't know exactly what has happened since, although I do recall Firefox making some adjustments on their end that improved the situation.

[0] http://www.extremetech.com/computing/182428-ironic-iframes-a...

[1] https://blog.mozilla.org/nnethercote/2014/05/14/adblock-plus...

ryuuchin 3655 days ago

No, it really is that simple[1].

The only reason I can think of for someone not using uBlock Origin is because they've never heard of it.

[1] https://github.com/gorhill/uBlock#performance

moron4hire 3654 days ago

I got rid of it because it made using the web nearly impossible and I got annoyed with constantly having to manage exceptions just to do things like view my Twitter analytics page. Downloading ads is less annoying than uBlock.

ryuuchin 3654 days ago

What filter lists were you using? uBlock Origin's behavior is identical to Adblock Plus given the same filter lists are used for both. uBlock uses a few more filter lists by default although I still think it's on the conservative end of the spectrum in terms of what gets blocked.

I've only ever had to whitelist two things in uBlock in all my time of using it and that's pretty good considering that I use basically all of the lists that aren't the language specific ones, the anti-anti-adblock one (which requires a user script) or the merged (ultimate) lists.

But yes, my browsing habits are clearly different from yours and what works for me may not for you. I understand that but what uBlock does is not any different than any of the other adblocks that are out there. It just happens to do it more efficiently and with a better UI than the rest of them. The only difference in behavior that you might encounter is likely to be related to uBlock Origin's strict blocking[1]. In this case exception filters are very easy to create since you literally just have to press the disable permanently and it will forever be disabled for that site.

I've said this before the the best way to use uBlock Origin if you've never used something like NoScript/uMatrix before (or couldn't be bothered with the whitelisting approach) is to try and use what it calls "medium mode"[2]. Using its dynamic filtering in this way should net you with the largest gain with the least amount of effort. If you're looking for something with more control than you may want to look into uMatrix since I think the interface will is nicer for that sort of control.

[1] https://github.com/gorhill/uBlock/wiki/Strict-blocking

[2] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...

anc84 3654 days ago

Click ublock icon, click power icon, done.

DominikD 3655 days ago

Also Opera has built-in blocking. It's not a JavaScript-heavy extension to begin with. ;)

wernerb 3655 days ago

Until we know exactly what websites/videos etc are loaded (also on the Opera benchmark!); you, me and all consumers do not know if this is true :)

Aissen 3655 days ago

> nobody seems to challenge our Netflix results:

Of course, this is Lotus vs Office all over again. You're using the OS advantage (Windows 10 and its DRM platform) to show that you can do things more efficiently.

You chose the technology platform (I'm guessing Play Ready), you forced GPU vendors to comply, and now you give competitors a bad reputation for daring to use their own (or a competitor's) technology.

rkangel 3655 days ago

Or, to put it another way: the have built a well integrated technology stack and are now demonstrating that it has superior performance on a metric we all care about.

Apple has a different approach (own the entire stack) but they're going for the same thing and it has made them a lot of cash.

Tloewald 3654 days ago

Netflix is a rather specific example — don't they use Silverlight (as in "the only site anyone in the world cares about that uses Silverlight"?)

It's like Apple stressing how well their browser works on -- oh wait, Apple doesn't produce a proprietary plugin. Well, let's say the iTunes page, if it were specifically tuned to work well in Safari.

ascagnel_ 3654 days ago

Edge and Safari run Netflix under HTML5's protected media path. Silverlight is only used for backwards compatibility.

spankalee 3655 days ago

Doesn't the Netflix test really test GPU decoding vs software decoding?

I'm not 100% sure on this, so any clarification welcome, but I thought that Netflix content contracts DRM requirements allowed GPU decoding via Playready, but not on some others like Widevine. Would the Netflix test still show such an advantage for Edge if there weren't any DRM?

j_h_s 3655 days ago

It seems to me that if you're running on the same device, the gpu should not be a variable.

caconym_ 3655 days ago

IIRC everything except Edge and Safari are forced to use Silverlight for Netflix streaming. I'm not exactly sure why this is, but it probably has something to do with the mess that is video codec licensing.

ryuuchin 3655 days ago

Chrome uses widevine (DRM) for Netflix which uses HTML5. Really the more significant difference is that Chrome gets capped to 720p. Edge actually gets the 1080p streams (as does IE11 on Windows 8+ and the Netflix Windows Store app).

Already__Taken 3655 days ago

Seems arbitrary, what's the reasoning for that? Isn't this exactly what everyone warned about with DRM in html5?

ryuuchin 3654 days ago

Your guess is as good as mine. Blame Netflix, Microsoft or the people pushing DRM. Maybe it's some combination of all of the above or none of them. I'm not sure anyone really knows why we have this behavior.

It's not necessarily DRM in HTML5 which is the issue here although I suppose the fractured nature of it may be partially to blame. It's the fact that it's seemingly arbitrary that Chrome gets capped to 720p and no one knows why. Did Microsoft pay a big chunk of change to Netflix for exclusivity or something along those lines? Do the content creators prefer Microsoft's DRM implementation over Widevine's?

tyleregeto 3655 days ago

Netflix plays fine in Chrome (maybe Firefox too) using HTML5, it was opt-in at one point not sure about now. I watch in Linux all the time, no Silverlight there.

russjr08 3655 days ago

Starting with Firefox 47, it works fine as well (macOS here), but you need to switch your user agent to Chrome (before you go to Netflix). I'm guessing Netflix hasn't updated their compatibility checker.

jsmith0295 3655 days ago

It's for DRM purposes

caconym_ 3655 days ago

Given that basically everyone should be blocking ads (if for no reason other than that they are a vector for malware), native ad blocking is really an energy saving mode, a user experience improvement, and a crucial security measure all rolled up into one feature.

If the focus of this test is meant to be on the user experience, as the tone of the Microsoft blog post seems to suggest, then said native ad blocker is a killer feature that really might mean Opera users can "stay productive longer".

If not, then sure, Edge's underlying implementation is probably faster. It would be interesting to see a comparable benchmark of Edge with the best available ad blocking turned on.

hristov 3655 days ago

Wow this is all fascinating! If it is not too much trouble, would you mind releasing the script that you used in your testing so that independent parties may verify your results.

8_hours_ago 3655 days ago

It would be interesting to see this test run again with ad blockers enabled on each browser (the latest version of Edge has support for extensions).

ntoshev 3654 days ago

It may not be fair to enable adblocking if you want to compare other engineering components, but these results still mean users in the real world will see more battery time with Opera than with Edge.

JaRail 3654 days ago

This is

* not based on observed user behavior

* not reproducible by you (recorded/replayed network where possible)

* not reproducible by peers

* in stark contrast to reproducible tests provided by others

You describe it as "highly scientific" and jump straight to a marketing campaign. Honestly, who's skipping a lap of the race?

It's pretty clear the data generated is useful from an engineering point of view. It can help identify problem areas. It's also absolutely clear that the conclusions being marketed are blatant misrepresentations of the work.

wdr1 3655 days ago

Could you disclose/open source your methodology?

hristov 3655 days ago

So where is your script? Or are we just supposed to trust you?

kayone 3655 days ago

you don't have to do anything.