Hacker News new | ask | show | jobs
by sreekotay 1645 days ago
If QOI is interesting because of speed, you might take a look at fpng, a recent/actively developed png reader/writer that is achieving comparable speed/compression to QOI, while staying png compliant.

https://github.com/richgel999/fpng

Disclaimer: have not actively tried either.

2 comments

I find it interesting that QOI avoids any kind of Huffman style coding.

Huffman encoding lets you store frequently used values in fewer bits than rarely occurring values, but the cost of a naïve implementation is a branch on every encoded bit. You can mitigate this by making a state machine keyed by "accumulated prefix bits" and as many bits as you want to process in a whack, these tables will blow out your L1 data cache and trash a lot of your L2 cache as well.¹

The "opcode" strategy in QOI is going to give you branches, but they appear nearly perfectly predictable for common image types², so that helps. It has a table of recent colors, but that is only of a few cache lines.

In all, it seems a better fit for the deep pipelines and wildly varying access speeds across cache and memory layers which we find today.

¹ I don't think it ever made it into a paper, but in the mid-80s, when the best our Vax ethernet adapters could do was ~3Mbps I was getting about 10Mbps of decompressed 12 bit monochrome imagery out of a ~1.3MIP computer using this technique.

² I also wouldn't be surprised if this statement is false. It just seems that for continuous tone images one of RGBA, DIFF, or LUMA is going to win for any given region of a scan line.

Meta: ␄ (https://en.wikipedia.org/wiki/End-of-Transmission_character) isn’t the right control character when footnotes follow; ␃ (https://en.wikipedia.org/wiki/End-of-Text_character) is a better fit, and ␌ (https://en.wikipedia.org/wiki/Form_feed) would be a decent choice too.

(I write comments with footnotes in the same style as you, but use “—⁂—” as the separator, via Compose+h+r (name from the HTML tag horizontal rule). Good fun being able to use Compose+E+O+T, Compose+E+T+X and Compose+F+F in this comment; I added the full set to my .XCompose years ago.)

Entirely unrelated to the OP, but comments like this are why I love HN.
One thing to note is that QOI composes really nicely with high quality entropy encoders like LZ4 and ZSTD. LZ4 gives a roughly 5% size reduction with negligible speed impact, and ZSTD gives a 20% size reduction with moderate speed impact (https://github.com/nigeltao/qoi2-bikeshed/issues/25).
I would think that you could use a hybrid approach where you have a table that is perhaps 9 or 10 bits wide and covers many of the more common codes, which will by definition be more common. Should be small enough to fit in the cache. Then do something slower for the very long codes. This way you avoid difficult branches most of the time.
Exactly. Normally you use second-level tables for the long codes. In the first table, if code doesn't reach a leaf, tbl[code] holds a pointer to the next table to use.

For example, here's JPEG XL's Huffman decoder: https://github.com/libjxl/libjxl/blob/1c67f4eff0464e6241f365.... The branch uses a second table if the code was too long.

Funnily enough it's just a few days since I did some similar code to support writing PNGs from a small embedded device. In this case the full deflate algorithm seemed like overkill in memory and CPU requirement, and most of the images were probably going to be served over a LAN anyway. https://twitter.com/toitpkg/status/1471986776357097475

https://github.com/toitlang/toit/commit/65c6c1bd7138f9ebced4... It's not as highly optimized as this effort though, and it just uses the standard huffman table that is built into deflate, rather than a static-but-custom one.