Hacker News new | ask | show | jobs
by corybrown 3388 days ago
Very cool. I'm not an expert, but does JPEG generally have a ton of flexibility in compression? Why so much difference in sizes?
3 comments

Three main methods:

1) YUV420 vs YUV444. Guetzli practically always goes for YUV444.

2) Choosing quantization matrices.

3) After normal quantization, choose even more zeros. JPEG encodes zeros very efficiently.

When doing the above, increase the errors where it matters least (certain RGB values hide errors in certain components, and certain types of visual noise hides other kind of noise).

Step 3), choosing more zeros, can be generalized with trellis quantization, which does a search for the best values to encode for each block for the best distortion-per-rate score, where distortion can be any metric (edit: apparently guetzli does some sort of whole frame search for this). mozjpeg does trellis with effectively the PSNR-HVS metric. Because the other two steps are only one setting that affects the entire picture, I do wonder how Guetzli would perform if it was just a wrapper around mozjpeg.
Yes, JPEG encoding has a ton of flexibility. You rearrange each block of pixels using the discrete cosine transform, which tends to pack more significant values towards one corner, and then you have lots of freedom over how to quantize those values. See https://en.wikipedia.org/wiki/JPEG#Quantization

On top of that, you could tweak the quantized values themselves to make them more compressible.

There's less flexibility than you might think - you get only one choice of quantizer and quantization matrix for the entire frame. So pretty much your only option is to twiddle the values themselves. This is usually done with trellis quantization, such as in mozjpeg. Guetzli seems to implement something simpler that just sets increasing numbers of coefficients to zero (based on my cursory reading of the source code).
I'm afraid Guetzli is quite a lot more complex. It does a global search on this, i.e., quantization decisions in neighboring blocks may impact the quantization decisions on this block. Also, quantization decisions have cross-channel impact between YUV channels.
There is no block to block prediction other than DC prediction, so is this effect due to your distortion function spanning multiple blocks? Same for cross YUV channels, because your metric is in RGB space?

edit: second read-through I found the paper [1] which explains it. The answer is basically "yes", where the large scale distortion function is basically activity masking. Normally this would be implemented with delta-QPs, but because JPEG doesn't have that, Guetzli uses runs of zeroes instead.

[1] https://arxiv.org/pdf/1703.04421

This comes through the internal use of butteraugli -- and depending the quantization decisions on butteraugli.

Butteraugli uses a 8x8 FFT, but computes this every 3x3 pixel creating coverage at block boundaries. In later stages of butteraugli calculation values are aggregated from an even larger area. Block boundary artefacts are taken into account by this and impact quantization decisions.

Butteraugli operates neither in RGB nor YUV. It has a new color space that is a hybrid of tri-chromatic colors and opponent colors. Black-to-yellow and red-to-green are opponent, but blue is modeled closer to tri-chromatic. In more simple explanation it is possible to think of it as follows: first apply inverse gamma correction, second apply a 3x4 transform for rgb, third apply gamma correction, fourth calculate r - g, r + g and keep blue separate.

Do you have / plan a paper describing butteraugli itself?

It seems like that's where most of the magic lies. Also peculiarities of human vision are one of my oddball interests, after compression of course. :)

The more bits you're willing to lose during quantization, the more zeroes the resulting bitstream will have, the better it will compress.

The more bits you lose during quantization, the more ringing and artifacts you can expect after the IDCT process.

So the tradeoff is quite literally artifacts for smaller size.

this compressor seems to be cleverer about where to lose data than libjpeg.