Do you mean as opposed to e.g. verifying the absence of timing attacks? While I agree that verifying the absence timing attacks is probably much harder than what was done here, the difficult part of the s2n verification I linked to was that we verified equivalence between imperative C code and a functional mathematical specification.
Right, it says "convincing argument that the C implementation does the same thing as the mathematical specification" and "Assuming that we didn’t accidentally program the same “bug” into our Cryptol spec".
My understanding is it's another way of white-box testing the code against specified behaviour, but just that using a (proven?) mathematical specification for algorithms is probably easier than writing unit tests that have to capture all edge cases. (In essence, it sounds like verification software is probably set up to detect such edge cases, which I do think is a good idea, because you only have to program such software once.)
I don't think I understand what you mean by "white-box testing" here, but perhaps it's helpful to clarify what I meant by "equivalence" above, and how it relates to testing: what we did here was verify input/output equivalence between the imperative C code and our functional mathematical spec in Cryptol, for a range of key and input buffer sizes. This corresponds to testing all inputs of those sizes, which is not possible to do by direct testing: e.g., for a 64 byte key and a 1000 byte message, the equivalence corresponds to checking
tests, which would take "forever" to verify by direct testing.
We did not prove any properties of our mathematical specification in Cryptol, but the claim is that it's close enough to the official FIPS mathematical specification for HMAC [1] that it's easy to believe that it's correct. However, a group at Princeton has also verified HMAC in the past, and gone further than us by not only proving that the imperative C code is input/output equivalent to their mathematical spec in Coq, but also proving that their mathematical spec has the security properties of a secure hash function [2].
AFAIK, white-box testing is simply when you can look at the source code (as opposed to black-box testing) for example a unit test is a type of white-box test.
What I was struggling to express is that in the mathematical notation, the operations are well defined (right?); in C that's not necessarily the case. So you could argue that if you were writing direct tests, you don't need to check all inputs, but testing edge-cases will do. And maybe that's true, but practically impossible for complex algos because how do you know which inputs cause edge case behaviour? So I was agreeing that this approach is probably better than having some fallible human write test cases :) (better = more thorough and reliable) And although you'd have to make sure the same fallible human hasn't put bugs in the mathematical spec, as you've said that's probably easier to check.
EDIT: Nevermind, I found part three about undefined behaviour. I had written: You seem to know loads about this, maybe you could say how undefined C behaviour is handled when comparing against a spec? Is e.g. shift-past-bitwidth simply forbidden? The only alternative I can think of is looking at the disassembly on a certain platform and checking those instructions, which sounds less than ideal.
* the operations in the mathematical spec are mostly well defined, but e.g. division by zero is not defined. However, the verification handles this by checking that all operations are well-defined on all possible inputs.
* yes, identifying the "edge cases" is not something you can do easily, and hard to make formal. In some sense, the fact the non-edge-case inputs are treated in a uniform way is probably what allows the verification to succeed at all.
* a short summary of the answer you already found in the third blog post: what we actually verify is the LLVM assembly that Clang produces when compiling the C program. Much of the potentially undefined behavior in a C program is translated away by the compiler on the way to LLVM assembly. For any potential undefined behavior that remains in the LLVM assembly, the verification checks that it cannot happen at runtime.