I'd pay to see Bryan Cantrill's reaction [0][1] to this: A seemingly mysterious firmware bug of a secure element / trusted-execution-environment but there's no knowing if there are more bugs (or, shudder backdoors).
Since the source code isn't available for scrutiny (though Google has promised firmware transparency [2]), it is kind of difficult to tell what really went wrong in the current reported case and what else possibly could go wrong given the use-cases for it are far-reaching and sensitive: Google has advocated StrongBox as a trustable companion that could be used to attest user actions on medical devices [3], for instance; or for use as an Identity verificafion for documents such as Driving Licenses and Passports.
My reaction might not be worth the price of admission, because it's exactly as you'd expect: this is great, diligent, responsible work -- and if the Titan M firmware had been open, it seems likely that the author would have been able to get to root cause (or much closer to it). This in turn would have in turned tightened Google's response time, led to a better fix faster, etc.
That said, there are several reasons for optimism.
1. OpenTitan.[1] On the one hand, this is not about opening up extant Titan implementations so much as developing next-gen Titan in the open -- but it is nonetheless a laudable and important effort and it is increasingly reasonable to expect that the hardware roots-of-trust of the future will be entirely open.
2. Open firmware more generally. The Open Source Firmware Conference[2] this past fall was truly inspiring in terms of the broad interest from the industry: while there is much work to be done, there is more reason than ever to believe that it's attainable.
3. Rust. It's hard to speculate without knowing what the root cause of this issue actually was, but to the degree that memory corruption was at root here, the emergence of Rust for firmware is an incredibly important development. Speaking personally, if there was any doubt in my own mind about the appropriateness of Rust at this lowest level of software, it has been erased by our own experiences at Oxide over the past few months: Rust is unequivocally the right language for firmware, and it will yield higher quality artifacts.[3]
More troubling to me than the closed source firmware is that the bug in TFA seems like something that the most basic of a test suite should be catching.
It’s reminiscent of Apple’s “goto fail” lack of certificate checking - another easily testable case that simply wasn’t.
The test authors don’t even need to be on the same team/manager. They can just write black box tests to the spec, like the author of this post did.
I’m not even some big TDD guy. It just seems to me that in these core security-critical libraries/functions that should be pretty side-effect-free that you should have some basic “receive x, produce y” functional tests to make sure the API is doing what it claims to do on the tin.
the most basic of a test suite should be catching.
A most basic test suite is not likely to wait some arbitrary amount of time (2 seconds, as the author found by trial and error) between calls to the HSM.
The images in the 'Digging deeper' section suggest otherwise. They appear to show a successful run followed by one that fails because the 'encrypted' value is garbage. Where am I missing the instantly reproducible failure?
In what way? 'Temporal' fuzzing to an eon-like range of two seconds seems, naively at least, entirely impractical.
Edit: a somewhat different way of putting this concretely - what is a practical stochastic testing regime that can reasonably be expected to find this bug?
Well designed systems have a mockable clock and timer subsystem.
They could easily test "Do X, wait 100 years, do Y".
You find all kinds of wired bugs when you do that - things that poll, for example Cron daily, will have to be run 36500 times. Certificates expire. Counters overflow. Date systems can't convert the date to and from strings. Logfiles get too big. Etc.
That said, there are several reasons for optimism.
1. OpenTitan.[1] On the one hand, this is not about opening up extant Titan implementations so much as developing next-gen Titan in the open -- but it is nonetheless a laudable and important effort and it is increasingly reasonable to expect that the hardware roots-of-trust of the future will be entirely open.
2. Open firmware more generally. The Open Source Firmware Conference[2] this past fall was truly inspiring in terms of the broad interest from the industry: while there is much work to be done, there is more reason than ever to believe that it's attainable.
3. Rust. It's hard to speculate without knowing what the root cause of this issue actually was, but to the degree that memory corruption was at root here, the emergence of Rust for firmware is an incredibly important development. Speaking personally, if there was any doubt in my own mind about the appropriateness of Rust at this lowest level of software, it has been erased by our own experiences at Oxide over the past few months: Rust is unequivocally the right language for firmware, and it will yield higher quality artifacts.[3]
[1] https://opentitan.org/
[2] https://osfc.io/
[3] http://cliffle.com/blog/prefer-rust/