|
I have worked on a performance-portable math library. We implement BLAS, sparse matrix operations, a variety of solvers, some ODE stuff, and various utilities for a variety of serial and parallel execution modes on x86, ARM, and the three major GPU vendors. The simplest and highest-impact tests are all the edge cases - if an input matrix/vector/scalar is 0/1/-1/NaN, that usually tells you a lot about what the outputs should be. It can be difficult to determine sensible numerical limit for error in the algorithms. The simplest example is a dot product - summing floats is not associative, so doing it in parallel is not bitwise the same as serial. For dot in particular it's relatively easy to come up with an error bound, but for anything more complicated it takes a particular expertise that is not always available. This has been a work in progress, and sometimes (usually) we just picked a magic tolerance out of thin air that seems to work. Solvers are tested using analytical solutions and by inverting them, e.g. if we're solving Ax = y, for x, then Ax should come out "close" to the original y (see error tolerance discussion above). One of the most surprising things to me is that the suite has identified many bugs in vendor math libraries (OpenBLAS, MKL, cuSparse, rocSparse, etc.) - a major component of what we do is wrap up these vendor libraries in a common interface so our users don't have to do any work when they switch supercomputers, so in practice we test them all pretty thoroughly as well. Maybe I can let OpenBLAS off the hook due to the wide variety of systems they support, but I expected the other vendors would do a better job since they're better-resourced. For this reason we find regression tests to be useful as well. |
Probably worth mentioning that in general the tolerance should be relative error, not absolute, for floating point math. Absolute error tolerance should only be used when there’s a maximum limit on the magnitude of inputs, or the problem has been analyzed and understood.
I know that doesn’t stop people from just throwing in 1e-6 all over the place, just like the article did. (Hey I do it too!) But if the problem hasn’t been analyzed then an absolute error tolerance is just a bug waiting to happen. It might seem to work at first, but then catch and confuse someone as soon as the tests use bigger numbers. Or maybe worse, fail to catch a bug when they start using smaller numbers.