Hacker News new | ask | show | jobs
by dipflow 89 days ago
Whenever I see those 'blocky' artifacts on a low-quality image, I used to just think of it as 'bad tech.' After reading this, it's cool to realize you're actually seeing the 8x8 DCT grid itself. You're literally seeing the math break down because there wasn't enough bit-budget to describe those high-frequency sine waves. It’s like looking at the brushstrokes on a digital painting.
2 comments

The default implementation of the decoding adds the artifacts.

This tool uses more clever math to replace what's missing: https://github.com/victorvde/jpeg2png

It just blurs out the details. I'd rather have a sharp image with artifacts.
Why?

You're not seeing the actual details either way.

The blurred version feels honest -- it's not showing you anything more than what has been encoded.

The sharp image feels confusing -- it's showing you a ton of detail that is totally wrong. "Detail" that wasn't in the original, but is just artifacts.

Why would you prefer distracting artifacts over a blurred version?

The details are quite real, and they make the image far more comprehensible.

Get a picture of grass, save it as a JPEG at 15% quality... It still looks like grass. Then run it through jpeg2png... The output looks like a green smear. You might not even be able to tell that it's supposed to be grass. jpeg2png just blurs the hell out of images.

Here's a side-by-side: https://ibb.co/99C0F34d

The details were destroyed long ago by the poor compression, you aren't getting them back either way.
You're talking utter nonsense.

Get a picture of grass, save it as a JPEG at 15% quality... It still looks like grass. Then run it through jpeg2png... The output looks like a green smear. You might not even be able to tell that it's supposed to be grass. jpeg2png just blurs the hell out of images.

Here's a side-by-side: https://ibb.co/99C0F34d

Also if your software for whatever reasons is using the original libjpeg in its modern (post classic version 6b) incarnation [1], right from version 7 onwards the new (and still current) maintainer switched the algorithm for chroma up-/downsampling from classic pixel interpolation to DCT-based scaling, claiming it's mathematically more beautiful and (apart from the unavoidable information loss on the first downscaling) perfectly reversible [2].

The problem with that approach however is that DCT-scaling is block-based, so for classic 4:2:0 subsampling, each 16x16 chroma block in the original image is now individually being downscaled to 8x8, and perhaps more importantly, later-on individually being upscaled back to 16x16 on decompression.

Compared to classic image resizing algorithms (bilinear scaling or whatever), this block-based upscaling can and does introduce additional visual artefacts at the block boundaries, which, while somewhat subtle, are still large enough to be actually borderline visible even when not quite pixel-peeping. ([3] notes that the visual differences between libjpeg 6b/turbo and libjpeg 7-9 on image decompression are indeed of a borderline visible magnitude.)

I stumbled across this detail after having finally upgraded my image editing software [4] from the old freebie version I'd been using for years (it was included with a computer magazine at some point) to its current incarnation, which came with a libjpeg version upgrade under the hood. Not long afterwards I noticed that for quite a few images, the new version introduced some additional blockiness when decoding JPEG images (also subsequently exacerbated by some particular post-processing steps I was doing on those images), and then I somehow stumbled across this article [3] which noted the change in chroma subsampling and provided the crucial clue to this riddle.

Thankfully, the developers of that image editor were (still are) very friendly and responsive and actually agreed to switch out the jpeg library to libjpeg-turbo, thereby resolving that issue. Likewise, luckily few other programs and operating systems seem to actually use modern libjpeg, usually preferring libjpeg-turbo or something else that continues using regular image scaling algorithms for chroma subsampling.

[1] Instead of libjpeg-turbo or whatever else is around these days.

[2] Which might be true in theory, but I tried de- and recompressing images in a loop with both libjpeg 6b and 9e, and didn't find a significant difference in the number of iterations required until the image converged to a stable compression result.

[3] https://informationsecurity.uibk.ac.at/pdfs/BHB2022_IHMMSEC....

[4] PhotoLine

eh, it is bad tech. modern compression algorithms hide the blocks a lot more because blocking is the most visible artifact
It's a perfectly pragmatic engineering choice. Blocking is visible only when the compression is too heavy. When degradation is imperceptible, then the block edges are imperceptible too, and the problem doesn't need to be solved (in JPEG imperceptible still means 10:1 data size reduction).

Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.

Deblocking is an inelegant hack.

Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).

The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.

I only recently learned that JPEG and MPEG-1 were designed for near-lossless compression, so the massive bitrate reductions which came further down the road had nothing to do with the original design.

"Inelegant" is the right word; it's hard to shake off the feeling that we might have missed something important. I suspect the next big breakthrough might be waiting for researchers to focus on lower-quality compression specifically, rather than requiring every new codec to improve the state of the art in near-lossless compression.

> for researchers to focus on lower-quality compression specifically

JPEG-XL already does this because it uses VarDCT (Variable-size Discrete Cosine Transform) aka adaptive block sizes (2×2 up to 256×256). Large smooth areas use huge blocks and fine detail uses small blocks to preserve detail. JXL spends bits where your eyes care most instead of evenly across the image. It also has many techniques it uses to really focus on keeping edges sharp.

JPEG XL achieves about half the bitrate of an equal-quality JPEG, even at lower quality levels. That's a real achievement, but the complexity cost is high; I'd estimate that JPEG XL decoders are at least ten times more complex than JPEG decoders. Modern lossy image codecs are "JPEG, with three decades of patch notes" :-)

I think we're badly in need of an entirely new image compression technique; the block-based DCT has serious flaws, such as its high coding cost for edges and its tendency to create block artefacts. The modern hardware landscape is quite different from 1992, so it's plausible that the original researchers might have missed something important, all those years ago.

The really big problem with blocking is that it introduces very visible artifacts in dark backgrounds and that they're a type of artifact that draws your attention to them. Part of the problem here is that 8 bit SRGB isn't quite sufficient to prevent visible banding without dithering in dark regions, so whne you add blocking artifacts to already slightly visible banding the result turns into a jagged attention grabbing mess.

Deblocking is inelegant but blur is a much less noticeable artifact than blocks. That said the best answer turns out to be having the input image in 10 bit, and having encoders/decoders work at higher internal bitrates which allows for the encoder to make smarter choices about what detail is real, gives the decoder some info from which it can more intelligently dither the decoded image.

IIUC AV2 is trying to resurrect the Daala deblocking work. I think Jpeg-xl also has some good stuff here (but I don't remember exactly what)