Hacker News new | ask | show | jobs
by UglyToad 1106 days ago
There is some approach you can use sort of like this within PNGs themselves I think. It's been a while so I might be misrembering but effectively each "row" of data that is compressed can be encoded as the difference from the preceding row using 4 or 5 different operations.

You can achieve better compression by brute forcing the possible operations used to encode the rows to find the "most compressible" output. Not quite what you meant but sort of similar in that you try multiple approaches and pick the best.

I gave up before implementing it but in the stub I left this comment to myself " A heuristic approach is to use adaptive filtering as follows: independently for each row, apply all five filters and select the filter that produces the smallest sum of absolute values per row.".

In addition more similar to your approach PDFs support many compression filters for objects internally like RLE and ZIP so you can choose the best algorithm per object but generally it's quicker just to ZIP everything.

1 comments

> There is some approach you can use sort of like this within PNGs themselves I think. [...] You can achieve better compression by brute forcing the possible operations used to encode the rows to find the "most compressible" output.

That's the approach used by tools like optipng and pngcrush.

> I gave up before implementing it but in the stub I left this comment to myself " A heuristic approach is to use adaptive filtering as follows: independently for each row, apply all five filters and select the filter that produces the smallest sum of absolute values per row.".

The same idea can be found in the PNG standard itself: "For best compression of truecolor and grayscale images, we recommend an adaptive filtering approach in which a filter is chosen for each scanline. The following simple heuristic has performed well in early tests: compute the output scanline using all five filters, and select the filter that gives the smallest sum of absolute values of outputs. (Consider the output bytes as signed differences for this test.) This method usually outperforms any single fixed filter choice. However, it is likely that much better heuristics will be found as more experience is gained with PNG." (quoted from the PNG specification, version 1.2)