Hacker News new | ask | show | jobs
by lifthrasiir 1007 days ago
To me, this bug is most similar to the Timsort bug back in 2015 [1].

Timsort is an ingenious hybrid sorting algorithm originated from CPython and many implementations including OpenJDK adopted it mostly via a source-by-source translation. Timsort particularly maintains a stack of sorted runs, and due to the construction there is a small enough finite limit in the maximum possible stack size. However the original CPython implementation didn't exactly match what was proven, so there were rare cases where stack overflow could happen. So this was a serious security bug in CPython, but wasn't in OpenJDK because Java instead threw an exception in that case.

Similarly, this WebP bug occurred because the largest table size was formally proven but it didn't match what was fed to the source code. This kind of bugs is not only hard to verify but also hard to review, because of course there is a proof and the source code seems to match the proof, so it should be okay! This bug suggests strong needs for approachable formal verification, including the use of memory-safe languages (type systems can be regarded as a weak form of formal verification), not human reviews.

[1] http://envisage-project.eu/wp-content/uploads/2015/02/sortin...

3 comments

Specifically, since performance is crucial for this type of work, it should be written in WUFFS. WUFFS doesn't emit bounds checks (as Java does and as Rust would where it's unclear why something should be in bounds at runtime) it just rejects programs where it can't see why the indexes are in-bounds.

https://github.com/google/wuffs

You can explicitly write the same checks and meet this requirement, but chances are since you believe you're producing a high performance piece of software which doesn't need checks you'll instead be pulled up by the fact the WUFFS tooling won't accept your code and discover you got it wrong.

This is weaker than full blown formal verification, but not for the purpose we care about in program safety, thus a big improvement on humans writing LGTM.

Wuffs author here.

It's not full-blown formal verification but it doesn't have to be. Unlike most formal verification projects that I've seen, the proof part of Wuffs isn't about correctness (proving that "this implementation satisfies the PNG specification"), it's about safety (proving that "this implementation won't read/write out of bounds"), which is much more tractable. Especially as the PNG spec (or WebP spec) doesn't mandate how to treat malicious input (like unbalanced Huffman tables).

Wuffs doesn't have a WebP decoder yet but it's literally in the Roadmap and I've previously written the golang.org/x/image/webp package in the Go programming language.

Wuffs does have a PNG decoder, PNG uses Deflate compression and Deflate and WebP's "decode a Huffman tree" data tables are very similar. Wuffs also has an equivalent of zlib's enough.c program mentioned in the original blog post to calculate worst-case memory requirements. Wuffs' version is called script/print-deflate-huff-table-size.go and it says that, for a 9-bit primary table, we need 852 table entries. Round that up to 1024, for nearest power-of-two.

If you look for HUFFS_TABLE_SIZE = 1024 and HUFFS_TABLE_MASK = 1023 in Wuffs' std/deflate source code, you'll notice that Wuffs' Deflate decoder isn't susceptible to the same problem as the C/C++ WebP decoder the original blog post discussed. This is because, unless the Wuffs compiler can prove otherwise, the code doesn't just look up the table like `this.huffs[0][i]`, it's like `this.huffs[0][i&mask]` and `i&mask` is always in bounds. The mask is either HUFFS_TABLE_MASK or it's of a refinement type guaranteed <= 511. The array is always (statically) allocated with 1024 entries even though enough.c says that, with dynamic allocation, we could possibly get away with a smaller table.

As you already know, for Wuffs (and unlike C/C++), taking out the `&mask` will lead to a compile-time error. Wuffs code won't compile unless the compiler can prove that array indexes are always within bounds.

From the grandparent:

> this WebP bug occurred because the largest table size was formally proven but it didn't match what was fed to the source code

With WebP+enough.c this 'largest table size' was calculated by exhaustive, brute-force search (not really a formal proof) but it was based on assumptions (balanced codes) that didn't match actual (malicious) input. Or, in zlib-the-library, there's other C code (https://github.com/madler/zlib/blob/ac8f12c97d1afd9bafa9c710...) that rejects unbalanced Huffman codes, but IIUC similar enforcement was (until very recently) missing in libwebp.

With Wuffs, even if the worst-case calculation was based on an incorrect model (or forgetting to separately reject unbalanced codes), the end result (on malicious input) might be "the wrong pixels" but it shouldn't be "buffer overflow".

Hi Nigel, Thanks for interjecting (and of course Thanks for WUFFS). I'm glad to hear that WebP is "on the Roadmap" but I think this sort of work ought to have been considered by vendors as a must-have, high priority project, rather than being sort of vaguely welcomed sideshow that's not on their critical path.

Yes, I actually think that the choice to unequivocally prove safety rather than to attempt to prove that you implemented something specific (but with the risk that our specification is wrong), is almost invariably the right choice for this problem space. It would not have been appropriate for the TLS 1.3 protocol (which has a machine proof that it satisfies our intended criteria stated in the RFC, modulo the Selfie attack and assuming our cryptographic primitives all do what they said they do), but it's exactly the right thing for say a DOCX parser, or ZIP parser, or a JPEG compressor or similar for which WUFFS is the right choice.

Slightly off-topic but I can't resist the urge to ask.

Why is the logo (https://raw.githubusercontent.com/google/wuffs/main/doc/logo...) no longer shown in the readme?

I was pretty sure I had heard of the project before but I had to find the logo to be absolutely certain, it's truly one of a kind.

The logo is no longer shown because I think the tweet-screenshot is a more compelling introduction for people new to Wuffs.
> With WebP+enough.c this 'largest table size' was calculated by exhaustive, brute-force search (not really a formal proof) but it was based on assumptions (balanced codes) that didn't match actual (malicious) input.

Maybe I'm stretching the definition, but the exhaustive search is also a proof, especially when it can be done quickly or an efficient proof certificate can be generated (enough.c is the former). I even think that enough.c can be modified to generate a reasonably sized trace to aid verification.

As an intellectual 'puzzle', I'd be curious to see what that sort of trace-guided verification could look like, if that's even feasible. My recollection of enough.c is that, algorithmically, it's sufficiently complicated that I need to really think hard when I'm reading the code. But also, it uses clever data structures specifically to reduce the memory requirements at runtime.

In comparison, "array size is always 1024, array index is always bitwise-anded with 1023, therefore always in-bounds" is undeniably simple and, per "The Fastest, Safest PNG Decoder in the World", practical and fast.

While I agree that you don't need the tight bound (especially because I think the tight bound for incomplete trees is not much higher than one for complete trees), you do need to be confident that 1024 is either a loose but correct bound or highly unlikely to be reached in practice. So you need to calculate some kind of bounds anyway... And this is another reason I thought something more than Wuffs is desirable [1]; a dynamic memory allocation, when allowed, makes many things easier to implement and prove.

[1] https://news.ycombinator.com/item?id=37480019

1024 is a loose but correct bound. That's what enough.c (or script/print-deflate-huff-table-size.go) says.

Stepping back, I'm not sure if I understand what you're saying. Perhaps we're just disagreeing on how "formal" (as in, part of a "formal verification" process) the enough.c program (or equivalent) needs to be?

---

As for dynamic memory allocation, Wuffs has a mechanism for the callee (Wuffs code) to tell the caller (C/C++ glue code) that it needs N bytes of "scratch space" or "work buffer", where N is dynamically dependent (e.g. dependent on the image dimensions in pixels).

It's not "dynamic memory allocation" in that Wuffs still doesn't malloc/new and free/delete per se. But it is dynamically (run time, not compile time) sized.

Grepping for "work buffer" or "workbuf" in Wuffs' example/ or std/ code (or SkWuffsCodec.cpp) should give some leads, if you want to study some code. The workbuf_len is often zero but, for Wuffs' std/jpeg and std/png, it is positive. Manage the work buffer in the same way as the pixel buffer: it's always caller-owned and callee-borrowed.

That's a neat trick with the index masking. I will have to try using Wuffs again.
Liquid Types [1] achieve the same in existing programming languages: no bounds checks at runtime, and reject programs without an in-bounds proof.

For a demo, see "Scrap your Bounds Checks with Liquid Haskell", giving a 6x speed up in high-performance parsing of UDP packets.

[1]: https://news.ycombinator.com/item?id=37349276 "A Gentle Introduction to Liquid Types"

[2]: https://github.com/Gabriella439/slides/blob/main/liquidhaske... "Scrap your Bounds Checks with Liquid Haskell"

Yeah, Wuffs can be classified as memory-safe languages in this case and should be endorsed as much as possible. I mean, I said a lot about the limitation of Wuffs in the last conversation, but that's because I believe something like Wuffs should be used more widely, and the current limitation of Wuffs can hinder the progress. We need several angles of attack to make it real, and Wuffs is only one of them.
> type systems can be regarded as a weak form of formal verification

Depends on the type system, right? I was recently looking at Idris which is a general purpose programming language but apparently can be used for proofs.

TIL Stack Overflow is not just the name of a website.
If that's a serious TIL then you might enjoy the Ye Olde Timey classic

Smashing The Stack For Fun And Profit (1996)

(OG Phrack link) http://phrack.org/issues/49/14.html (Text-zine)

(TISM Berkeley CS coursework) https://inst.eecs.berkeley.edu/~cs161/fa08/papers/stack_smas... (PDF)

But unrelated to the stack overflow in Tim sort.
Not entirely - I wrote a lot of numerical backend code intended for 24/7/365 continuous use in the face of potentially any input .. aquisition | instrumentation equipment failures can produce glitch inputs just as damaging as delibrately crafted malicious inputs from a hacker reverse engineering existing code.

The common ground being code that has to be robust in the face of any input has to be (correctly!) verified, constrained within hard resource limits, have graceful fail over behaviour, some measure of sanity checking, etc.

Otherwise, if an e\/il hax0r doesn't get you .. mother nature and sheer bloody chance will.

I would really recommend trying to write a little C program that deliberately causes a stack overflow. It is completely harmless and is good fun if you've never encountered this kind of low-level systems code before.

It's also pretty fun to try asking for weird amounts of memory with 'malloc' or seeing how many times you can ask the OS for no memory before something fails. You might need to force-reboot your computer after that though!