I'm surprised the summary of the article only talked about over-reliance on fuzzing and then suggested 1) more thorough code reviews and 2) sandboxing as solutions?! To me, the solution lies in using memory-safe languages.
I think sandboxing is the more powerful solution. You think in terms of "What privileges can the attacker gain if this code blows up?" and limit the code's privileges to the minimum.
Problem is, sandboxing is harder to implement so it's often done suboptimally or not at all.
Yes and Google Chrome has invested heavily in sandboxing and still had to ship this as a high-priority fix. I'd say sandboxing in conjunction with memory-safe languages is the future.
The problem is that rewriting existing code into a memory-safe language is a huge investment -- and realistically the world depends on a lot of code built over many decades that cannot be rewritten overnight. Consider that Mozilla created Rust specifically so they could rewrite their browser in it, yet still only a small fraction of Firefox code is Rust today -- much more is still C/C++. Realistically we're going to have a lot of heavily used C/C++ code forever. The singularity will come before we can replace it all.
The nice thing about sandboxing and fuzzing can be applied to existing code.
Yes, but sandboxing and fuzzing are insufficient. As pointed out in the article Google had been fuzzing this library and it didn't find the issue. They even tweaked the fuzzing after this issue was found to specifically target the area of the vulnerability and it apparently still didn't trigger the issue.
Google Chrome also implements sandboxing and many areas. It's not feasible everywhere. So for new code / libraries we should default to a memory-safe language.
I think almost everyone agrees new greenfield projects should not choose C/C++, now that Rust has matured enough and covers essentially the same use cases.
But realistically that only solves a tiny fraction of the problem, since realistically new greenfield projects started today will likely take 10+ years to become widely used, if they do at all.
WebP was written at a time when C/C++ was still the only viable language in which to write an image compression library. Saying "things like this should be written in Rust!" just doesn't actually do anything to make software like WebP secure. Improving fuzzers and sandboxing might.
> realistically new greenfield projects started today will likely take 10+ years to become widely used, if they do at all.
I'm pretty sure that even now a lot speaks against using rust for greenfield projects precisely for that reason: few people want to integrate Rust into their build chain. You basically always have to have a compiler that's 1 release old or otherwise you cannot compile new Rust software like the very widely used time crate.
If you are a new and unproven format, do you really want to bear that hit? You could make two implementations, one in C and one in Rust, but that will mean you spread your probably quite scarce engineers over two projects.
A rewrite of a webp decoder into Rust already exists (image-rs has a decoder for webp). I'm sure there is some polishing needed but it's nothing a company the size of Google can't afford.
That's for the encoding part. The 0day we are talking about was in the webp decoder, and decoders generally are the riskier component compared to encoders. I'm not sure if Chrome ships with the webp encoder at all.
Mozilla has an interesting sandboxing strategy: They compile some of the C/C++ parts of Firefox (e.g. the ogg parser) to Wasm and then back to C. Because Wasm is memory safe, the resulting C is too.
Second problem: sandbox aren't perfect either. It's indeed useful, as part of a defense-in-depth approach, but it's far from sufficient.
Memory safety could solve the problem altogether, but then again no program is 100% memory safe, there's always some kind of primitive that uses memory-unsafe code under the hood, so it's not perfect either.
The “perfect” solution would probably be:
- use memory safe languages
- all primitives using memory unsafe stuff should get formally verified
Rust is kind of aiming at this (with things like [1] and [2]), but it's not there yet.
Strong type systems can give provably correct code. For trusted code (e.g. not third party code), sandboxing is a post-exploit mitigation. And such a post-exploit mitigation cannot necessarily guard against any class of bugs that (at least in some aspect) provably correct code can.
Yes, of course privilege separation as much as possible is still extremely valuable, but to say that sandboxing is a "better" solution, implying that one should not pursue provable correct code in favor of post-exploit mitigation, is a harsh liability. It's the same as the "oh, we don't need to use a type safe language, we have unit tests"-crowd, only worse.
> Strong type systems can give provably correct code.
"Sound" [1] type systems only guarantee the absence of some class of bugs as well. There are a lot of bug classes that remain exploitable. Memory safety happens to be a low-hanging fruit because many existing softwares are not even written in such languages.
[1] "Strong" type systems generally refer to the intolerance towards implicit type conversions or memory unsafety, and that alone doesn't make type systems provably safe in some sense.
I did not claim that everything about the code was proven correct, but a subset of properties expressable by the type system.
I agree that using the word “strong” was wrong. I basically meant it in the sense of “good”/“elaborate”, mistakenly ignoring that “strong” already has a very specific meaning in type systems. Thanks for correcting.
Yeah, anything can be implemented incorrectly. My point was specifically that ignoring memory safe languages in favor of post-exploit mitigation is foolish, not that the latter is useless.
But in practice, type systems can be proven to be sound, implementations of type checkers can be proven correct, and while it’s still possible to make mistakes there, that isn’t so much an issue in mathematics, simply because of how rigorous it is. This does not just include type systems, there’s even an effort to rebase the foundation of mathematics on a type system instead of set theory!
In other words, we likely agree that steel vaults can have holes. But we probably also agree that an average steel vault is better in keeping things in or out of it than a velvet curtain.
> In theory, perhaps. In practice, the compiler / runtime will have bugs.
They probably won't. The trusted kernel for these systems is tiny; a sandbox is orders of magnitude more complex with orders of magnitude more chances for bugs to creep in.
What does provably correct mean here? I think you mean that the code doesn’t have any memory corruption vulnerabilities. However, that is only one class of vulnerability, so more techniques then just relying on a memory safe language are required for secure software.
It means that the type system can prove certain properties for you. For example, in languages with dependent type systems like Agda, you can construct a sorted list type that the compiler will prove it sorted at all times, otherwise it won’t compile. Or a complex tree type that is always balanced. Or a set of only even numbers, and again it won’t compile otherwise…
(Sadly, if you go that far, it isn’t generally Turing complete anymore. Though in some cases that’s a good thing.)
Using memory safe languages may help, but the core issue lies on the lack of understanding of the program as a whole. Hence, the over reliance on fuzzing.
Debating what the "core issue" is may be an interesting philosophical debate, but at the end of the day, using a memory-safe language would have prevented this issue from being exploitable.
I understand, really.
But much of the discussion is already on that topic. It's not that I don't consider it unimportant.
I'm just trying to point out a bigger issue with software, in my opinion. Even more so when talking about the standard implementation of an image format as big as this. It really puts a lot into perspective on how bad software is commonly designed.
Luckily type safe languages, which includes memory safe languages, help you in understanding the program as a whole better, and strictly prevent you from doing things against that understanding, because the types literally encode automatically proven properties of your code.
Not all memory safe languages are type safe. I'd argue that expressive ML-style typing is if anything more important than memory safety, although it's hard to tell since a type-safe language will almost always have to be memory-safe.
I never claimed all memory safe languages are type safe. But memory safety is a subset of type safety. Whether dynamic or static (though I’m much in the latter camp), memory safety is achieved through information associated with the types. Whether that’s exposed to the surface or not: Rust exposes it and is static, Haskell normally does not expose it and is also static, most modern dynamic languages are memory safe but don’t expose the associated information.
> But memory safety is a subset of type safety. Whether dynamic or static (though I’m much in the latter camp), memory safety is achieved through information associated with the types.
Nah. Dynamic languages don't actually have types (even if they have something that they call types), and even typed-but-GCed languages often don't really use the types for memory safety.
They have value types. The expressions don’t have types (or rather very general types), but the values do, and it’s still possible to have a dynamic language with very rigorous value types that have a well-defined set of inhabitants.
Any dynamic language that has a string type has something similar to (for example) a buffer and a length associated with it internally. You can formalize that from the compiler’s perspective, even if you don’t expose it to the outside.
You could argue that the language doesn’t necessarily have memory safety associated with its types, because a compliant compiler or interpreter could represent strings in any way it chooses, and on some academic level there’s merit to that, but in practice you’d be rather stupid to implement the string value type in an interpreter or compiler for a language with a common string type in a memory unsafe way.
Problem is, sandboxing is harder to implement so it's often done suboptimally or not at all.