the most basic of a test suite should be catching.
A most basic test suite is not likely to wait some arbitrary amount of time (2 seconds, as the author found by trial and error) between calls to the HSM.
The images in the 'Digging deeper' section suggest otherwise. They appear to show a successful run followed by one that fails because the 'encrypted' value is garbage. Where am I missing the instantly reproducible failure?
In what way? 'Temporal' fuzzing to an eon-like range of two seconds seems, naively at least, entirely impractical.
Edit: a somewhat different way of putting this concretely - what is a practical stochastic testing regime that can reasonably be expected to find this bug?
Well designed systems have a mockable clock and timer subsystem.
They could easily test "Do X, wait 100 years, do Y".
You find all kinds of wired bugs when you do that - things that poll, for example Cron daily, will have to be run 36500 times. Certificates expire. Counters overflow. Date systems can't convert the date to and from strings. Logfiles get too big. Etc.
The premise was that basic automated testing by some other team at Google (to whom the HSM is a black box) would catch this. I don't see how that's obvious. Then you're all 'but fuzzing!' and I'm like 'wat' and now you're asking me if I know you can set the clock on the HSM? I don't think I know that. It's a black box.
Perhaps catching the instantly reproducible failure before release would have lead them to the root cause of both.