| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eridius 2843 days ago

> This is a little more than eight thousandths of one bit. Therefore, the information given by one passing test run is just a little over one millibyte. To debug this compiler, we're going to be running hundreds and hundreds of experiments, each of which gives us a tiny fraction of one bit, until we can finally put together the single byte we need to isolate the faulty optimization.

It doesn't work that way. If you're dead set on measuring a single passing test run as a milllibyte's worth of information, you have to realize that every subsequent passing test run is worth less and less. Otherwise running a thousand passing tests would give you a full byte of information, and yet a thousand passing tests does not give you 100% confidence that the enabled optimizations are not buggy. The only way to get a full byte's worth of information is to actually hit upon a failing test run.

And in fact even this is mistaken. You don't have a millibyte's worth of information. You have a lot more. The number of passing test runs at a given optimization gives you a probability that the optimization is buggy. Claiming that all this information you have about probabilities of the bugginess of various optimization passes is really just a millibyte is misleading; you have a lot of information, you're just ignoring it and saying you only care about a millibyte's worth of it.

2 comments

hinkley 2843 days ago

I run into this with much larger p and have to explain it to coworkers.

Flip the scenario around: the code is fine but the test has a concurrency bug in it. It fails about 20% of the time. You ask someone to fix it. After six green test runs they declare the bug fixed. But then build 8 fails and so does build 10.

Why? Because probabilities aren’t additive, they’re multiplicative. You didn’t test 6p > 1, you tested (1 - o)^6 < 0, which will never be true.

In this case there was a 26% chance the test was still broken after 6 green tests. You need 11 runs just to get the probability into single digits.

(Another consequence of this is that if you have 5 flaky tests the probability of a green build is one in four, and the likelihood of getting multiple consecutive red builds is high enough that it happens every week or two).

link

Silixon 2843 days ago

In the limit of infinite passing tests, do we recover the byte we sought? Or does it asymptote to some other quantity? I'm asking relative to a maximum entropy prior.

link