| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hohenheim 2176 days ago
	Fantastic read. My only concern is that there wasn't any talk around cost of false positives (selecting a test to run where it is unnecessary) vs false negatives (incorrectly dismissing a relevant test), as those costs in terms of their effect is not symmetrical. The cost of a bug slipping through because a test being skipped will be higher than running an irrelevant test to a commit.

4 comments

halbersa 2175 days ago

One of the authors here, first off thanks!

Yes a regression slipping through would far outweigh the benefits of reduced tests. The thing the post didn't make very clear is that thanks to our integration branch, the chance of a missed regression is still nearly zero. If the scheduling algorithm misses something, the failure will show up on a "backstop" push. These are pushes where we run everything, and then a human code sheriff will inspect any failures, and if something was missed figure out what caused it and back it out.

So the costs of missed regressions are: 1) More strain on the sheriffs (too much strain means the need to hire more) 2) More backouts which is annoying to developers and can mess up annotation (though we have ideas to fix the latter).

For the record, the algorithm with the 70% reduction in tests has a regression rate almost on par with the baseline (it's ~3-4% lower). This hasn't seemed to result in much additional strain on the pipeline.

link

jeffbee 2175 days ago

There isn't any discussion of the cost at all. It just says the test run rate is down by 70%, it doesn't say anything about the defect detection rate, even though they say this is their cost function.

10 core-years per day sounds like a lot but it's only about a 10kW load, and they've saved 70% of that, or about $20 of opex per day.

link

halbersa 2175 days ago

One of the authors here, I can't exactly deny that line was added to sound impressive, so guilty as charged. However the savings are much higher than $20/day for a few reasons:

* Many tasks run on expensive instances (hardware acceleration, Windows)

* We have OSX/Android pools that run on physical devices in a data centre (these are an order of magnitude more expensive than Linux)

* There are ancillary costs. For example each task generates artifacts which incur storage costs. These artifacts are downloaded which incur transfer costs.

* There are also overhead costs (idle time, rebooting, etc) that aren't counted in the 10 years / day stat.

All these things see a corresponding decrease in costs with fewer tasks.

link

dmurray 2175 days ago

Is that really all? That would be 3650 cores running full time. 3W per core sounds too little for power consumption. And do power costs really dominate the price of running CPUs? I'm guessing the savings here are at least one order of magnitude more than your $20/day.

I get about $1000/day based on some EC2 prices for typical machines I've used, though I'm sure Mozilla's requirements are different and they can negotiate better prices than I can.

link

jeffbee 2175 days ago

I probably missed a few factors, but I just hate a blog post that uses big-sounding numbers when they aren't big.

link

bonoboTP 2175 days ago

Big for who? Hundreds of machines running constantly is big for me.

link

mlthoughts2018 2176 days ago

> “ The cost of a bug slipping through because a test being skipped will be higher than running an irrelevant test to a commit.”

It really depends on the type of bug, and perhaps this could be factored into the model by also correlating change sets with outage severity or complexity of a fix.

link

sfink 2174 days ago

"A bug slipping through" in this case just means slipping through to where it's detected on a later push to the integration branch, or failing that, when a more complete set of tests runs when the change is merged into the main branch. In no case will poor scheduling here result in a bug making it into the final product. It's just that it's more costly in human time to detect it later, so currently the entire goal is set at detecting the problem on the first round of testing after a push.

link

im3w1l 2175 days ago

They talk about reducing on-commit test-runs. I'd assume they all run pre-release.

link

weaksauce 2175 days ago

they have a try server that developers can push to to run a swath of tests before bringing into the integration branch. outsiders can access that by being vouched for by a developer in mozilla and insiders obviously have access to it already. having used it as an outsider it's kind of a pain to use with a lot of setup and options. so having something like `mach try auto` would be awesome for outside devs in addition to the reduce server costs.

link