| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hauntsaninja 125 days ago
	git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky. In addition to the repo linked in the title, I also wrote up a little bit of the math behind it here: https://hauntsaninja.github.io/git_bayesect.html

4 comments

ajb 121 days ago

Nice! I implemented a similar thing a while back: https://github.com/Ealdwulf/BBChop

I'm going to have to check out how you got linear time with Shannon entropy, because I used Renyi entropy to do that, to make the algebra easier.

It's also possible to do it over the DAG, rather than a linear history - although that makes the code a lot more complicated. Unfortunately there doesn't seem to be a linear time cumulative sum algorithm over dags, so it's super linear in some cases.

link

Myrmornis 122 days ago

This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?

link

belden 121 days ago

My team doesn’t always have cleanly bisectable branches being merged to main —- it’s not uncommon to see “fix syntax error” types of commits.

But, to merge we need to have all tests pass. (If tests flakily pass then we get new flakey tests, yay!)

I know git-bisect doesn’t support this: but could git-bayesect have an option to only consider merge commits? Being able to track a flake change back to an individual PR would be really useful.

link

sluongng 121 days ago

You can run bisect with first-parent

link

hauntsaninja 121 days ago

That sounds right. `git_bayesect` currently uses `--first-parent`, so I think belden's use case should work, but I haven't tested it much on complicated git histories.

link

menaerus 120 days ago

> I know git-bisect doesn’t support this

It does.

link

llasram 121 days ago

Neat! I work on a system with some very similar math, but a slightly different model. I really like how in bayesect making the error rates asymmetric via independent Beta priors on bidirectional errors allows the computations to be nice and symmetric.

I haven't worked these all the way through, but I'm slightly skeptical or at least confused by a few details:

1. Another way to frame P(D|B=b) would be to have the old vs new side draws be beta-binomial distributed, in which case we should then have binomial coefficients for each of the draw side probabilities for the number of possible orderings of the observations. Do they end up cancelling out somewhere? [ed: Oh yes, of course -- D includes that in each case we observe exactly one of the C(n,k) orderings.]

2. I think your expected conditional entropy code is treating the imputed new observations as independent from the past observations, though even if that's the case it may not affect it much in this model. If it does though, it might be worth explicitly unit-testing the naive vs efficient calculations to ensure they match.

Anyway, thanks for sharing!

link