Hacker News new | ask | show | jobs
by HALtheWise 1771 days ago
One thing I never understood is why _downsampling_ is the most efficient way to compress the data about chroma into fewer bits while maximizing perceptual accuracy. It really seems like for any given target bitrate for the chroma data, there should always be a more efficient compression scheme available than simply throwing out 3/4 of the pixels and running compression algorithms on the rest. Surely modern compression can do better with a continuous low pass filter or a adaptive compression scheme that focuses data on interesting edges or something? Maybe someone here can better explain the intuition for this. I'm similarly curious for resolution in general (i.e. why does 480p upsampled ever look better than 1080p at the same bitrate) but chroma seems like a good place to start.
4 comments

>Surely modern compression can do better

I "surely" look forward to your Show HN write up on your new compression algorithm. We've been iteratively getting better at compression for some time now. It seems like everytime it looks like we've wrung every bit out of DCT, someone comes up with some a little more clever. Wavelets looked promising, but never took off.

>why does 480p upsampled ever look better than 1080p at the same bitrate

That's a very vague question. Are you stating that you think 480p upsampled to 1080p at 1.5Mbps looks better than a source at 1080p at 1.5Mbps? I have a hard time believing this to be true.

To understand why the chroma is sub-sampled and not the luminance has to do with how the cones/rods in the eyes work. There's a lot of things you can get away with (or trick if you will) the brain in what it is seeing. Is it better to lose half the height or half the width? Is it better loose more red than green or blue?

JPEG XL doesn't perform chroma subsampling in its native color space of XYB. https://cloudinary.com/blog/how_jpeg_xl_compares_to_other_im...
It makes sense not only from biological point of view as noted in the article, but also from technological as well. Almost all color cameras use Color Filter Arrays [1], meaning that for WxH resolution you don't get WxHx3 pixel values as you would expect from RGB images which you usually consume, but only WxH (i.e. 2/3rds of RGB image data is generated, not measured). With 4:4:4 sampling you have 12 values per 2x2 block, even though only 4 values have been measured by camera for it. Meanwhile with 4:2:0 sub-sampling you have 6 values, which is still bigger than 4, but quite convenient for processing in Y-based color spaces.

[1]: https://en.wikipedia.org/wiki/Color_filter_array

I have had the same thought. Why not do away with chroma subsampling and just compress the chroma planes more heavily than the luma plane? Does heavy compression perform worse than just throwing away 3/4 of the data?