|
|
|
|
|
by boothby
2582 days ago
|
|
I'm the primary developer for a heuristic, nondeterministic algorithm. It's both production software, and also a neverending research project. Specifically, I can't guarantee that a particular random seed will always produce identical results because that hobbles my ability to make future improvements to the heuristic. I've got reasonable coverage of my base classes and subroutines, but minor changes to the heuristic can have significant impact on the "power" of the heuristic. My solution was to add a calibrated set of benchmarks. For each problem in the test suite, I measure the probability of failure. From that probability, I can compute the probability of n repeated failures. Small regressions are ignored, but large regressions (p < .001) splat on CI. It's fast enough, accurate enough, and brings peace of mind. I understand that, and why, engineers hate this. But it's greatly superior to nothing. |
|