Hacker News new | ask | show | jobs
by crazygringo 454 days ago
> by generating output in a psychovisually optimal space? Perhaps frequency space (discrete cosine transform)

I've never understood the DCT to be psychovisually optimal at all. At lower bitrates, it degrades into ringing and blockiness that don't match a "simplified perception" at all.

The frequency domain models our auditory space well, because our ears literally process frequencies. Bringing that over to the visual side has never been about "psychovisual modeling" but about existing mathematical techniques that happen to work well, despite their glaring "psychovisual" flaws.

On the other hand, yes a HSV color space could make more sense than RGB, for example. But I'm not sure it's going to provide a significant savings? I'd certainly be curious. It also might create problems though, because hue is undefined when saturation is zero, saturation is undefined when brightness is zero, etc. It's not smooth and continuous at the edges the way RGB is. And while something like CIELAB doesn't have that problem, you have the problem of keeping valid value combinations "in bounds".

1 comments

JPEG is good for when you want a picture to look reasonably good while throwing away ~90-95% of the data. In fact, there's a relatively new JPEG variant that lets you get even better psychovisual fidelity for the same compression level by just doing JPEG in the XYB color space, xybjpeg. JPEG is also a very simple algorithm, when compared to the ones that'd be noticeably better near 99% compression.

To beat blockiness/banding across very gradually varying color gradients (think eg the gradient of a blue sky), JPEG XL has to whip out a lot of tricks, like handling sub-LF DCT coefficients between blocks, heterogeneous block sizes, deblocking filters for smoothing, and heterogeneous quantization maps.

BTW, one of the ways different camera manufacturers aimed to position themselves as having cameras that generated the best pictures was by using custom proprietary quantization tables to optimize for psychovisual quality.

No disagreements.

I do suspect that at some point we will make a major compression breakthrough that is based on something more "psychovisual". Not Gaussian splatting, but something more akin to that -- something that directly understands geometric areas of gradating colors as primitive objects, textures as primitives, and motion as assigned to those rather than to pixels.

On the other hand, it may very well be a form of AI-based compression that does this, rather than us explicitly designing it.