|
|
|
|
|
by crazygringo
454 days ago
|
|
> by generating output in a psychovisually optimal space? Perhaps frequency space (discrete cosine transform) I've never understood the DCT to be psychovisually optimal at all. At lower bitrates, it degrades into ringing and blockiness that don't match a "simplified perception" at all. The frequency domain models our auditory space well, because our ears literally process frequencies. Bringing that over to the visual side has never been about "psychovisual modeling" but about existing mathematical techniques that happen to work well, despite their glaring "psychovisual" flaws. On the other hand, yes a HSV color space could make more sense than RGB, for example. But I'm not sure it's going to provide a significant savings? I'd certainly be curious. It also might create problems though, because hue is undefined when saturation is zero, saturation is undefined when brightness is zero, etc. It's not smooth and continuous at the edges the way RGB is. And while something like CIELAB doesn't have that problem, you have the problem of keeping valid value combinations "in bounds". |
|
To beat blockiness/banding across very gradually varying color gradients (think eg the gradient of a blue sky), JPEG XL has to whip out a lot of tricks, like handling sub-LF DCT coefficients between blocks, heterogeneous block sizes, deblocking filters for smoothing, and heterogeneous quantization maps.
BTW, one of the ways different camera manufacturers aimed to position themselves as having cameras that generated the best pictures was by using custom proprietary quantization tables to optimize for psychovisual quality.