Hacker News new | ask | show | jobs
by userbinator 1007 days ago
It's surprising that this isn't in a "newer" part of the image format; Huffman compression has been around for over 70 years, Canonical Huffman for a few decades less, but even JPEG uses Huffman. This is a decades-old technology that should've had the bugs worked out of its many implementations by now and there are also countless articles about how to implement it.

I've read the JPEG spec (and written a decoder) before, so I decided to look at the WebP spec: https://developers.google.com/speed/webp/docs/webp_lossless_...

The important information is in section 6. The first thing I notice is that it's not very clear how the codes are constructed, unlike the JPEG one (which actually has a ton of very readable flowcharts on the process), but it appears to be similar to LZH/deflate(zlib). The "specification" looks more like a selected set of source code fragments with accompanying descriptions.

Perhaps I should try writing a WebP decoder too, having already done GIF, JPEG, and PNG, but based on the above "specification", it's almost as if they don't want you to.

2 comments

> it's not very clear how the codes are constructed

I agree the specification really lacks examples, but it explicitly states that it uses canonical Huffman trees so that only code lengths have to be transmitted. I think this is clear enough to pinpoint the actual tree. (I don't think there is any canonical Huffman tree implementation that uses the inverse lexicographical order.)

> This is a decades-old technology that should've had the bugs worked out of its many implementations by now and there are also countless articles about how to implement it.

Because there are many implementations of Huffman trees with different trade-offs? Charles Bloom once said that the definitive 1997 paper on Huffman optimizations [1] is still not well known at that point (2010) and many optimizations were rediscovered and then forgotten, so there should be many inefficient implementations out there.

[1] https://cbloomrants.blogspot.com/2010/08/08-12-10-lost-huffm...

If you want some code to study, https://github.com/golang/image/tree/master/vp8l is a WebP-Lossless decoder in under 1200 lines of code.