I remember that almost 20 years ago, I played with a utility that allowed manual JPEG optimization by painting regions with desired quality settings. I think it was included on a CD bundled with "Magazyn internetowy WWW" [1]. Does anyone remember such program?
This is OK for icon sized images, but it hurts me when I read webpages that incorporate photographs or visualizations in low resolution. They keep saying “Web resolution” and talk about something like 200x200 images maximum.
Even for the past 20 years, when I read webpages with images, I inspect them closely. Often it’s something like a newspaper article with photographs or Wikipedia. I specifically set my Wikipedia settings to deliver me images at the highest possible resolution and now I can actually enjoy reading it. Specifically I use the Timeless skin and set the thumbnail size to the maximum.
Sometimes I come across webpages that describe something like historic trains and all they have are icon sized photographs of them. It’s so sad.
Thing I'd love to try if I had time - a compressor which derives the best table for the image. I'm imagining a loop of: start with the default, compress, calculate the difference from the original, dct the error to see which patterns are missed, adjust the table, repeat. Stop on some given error/size-increase ratio.
(yes, I'm trying to get someone else nerd sniped into doing this)
Guetzli was already mentioned and roughly does what you are talking about.
MozJPEG [1] includes several quantization tables that are optimized for different contexts (see the quant-table flag and source code for specific tables[2]), and the default quantization table has been optimized to outperform the recommended quantization tables in the original JPEG spec (Annex K).
It's also worth noting that MozJPEG uses Trellis quantization [3] to help improve quality without a per-image quantization table search. Basically rather than determining an optimal quantization table for the image, it minimizes rate distortion on a per-block level by tuning the quantized coefficients.
Both the SSIM and PSNR tuned quantization tables (2 and 4) provided by MozJPEG use a lower value in the first position of the quantization just like this article suggests (9 and 12 vs the libjpeg default of 16).
> MozJPEG use a lower value in the first position of the quantization just like this article suggests
It has lower value in the first position of the base table, i.e. the table which is used for q=50. With lower qualities this value scales up. This delays color banding from q=50 to roughly say q=40, after that the same effect is appears.
Guetzli is really hamstrung by its resource usage. When it first hit the news I tried it out, and compressing a full quality JPEG from my phone could take 20-30 minutes on an i7.
It depends on the context. If I'm converting a whole library of photos I wouldn't use it. But I've got a big hero area JPEG that's loaded as one of the first resources - I'm happy to run this tool in the background for a day to make it 20% smaller.
I love this and would have dearly needed it like 5 years ago. Now, it is still a very interesting read.
But given what we have already seen from Nvidia on video compression [0], I think within the next few years, we will move everything to machine-learning-'compressed' images (aka transmitting a super-low-res seed image and some additional ASCII and having an ML model reconstruct and upscale it at the client side).
Most images are still JPEG (3 decades old) or PNG (2.5 decades old). Countless better formats have been developed, but with the exception of WEBP we are still using the same image formats that existed during the dot-com bubble. Ubiquity trumps improvements in image size.
Better encoders for JPEG or PNG are the main avenues how you can achieve improvements without compatibility problems, and I think that will stay true for another decade, if not more.
lossy compression is one thing, but to just say that an ML model suggests making pixels like this vs a mathematical formula is totally different things.
Image -> mathematical forumla to toss data -> reverse formula -> slightly altered image
vs
Image -> mathematical formula to toss data -> ML to recreate what it thinks is supposed to be there -> made up image based on "training" data not even from original image
but the end result doesnt have to be direct output of ML hallucination. AI encodes probability distribution, you can treat it as motion compensation in video codecs - what comes next is a convolution by encoded error between predicted outcome and ground truth.
So how is that different than motion estimation as it currently stands. That at least sees where pixels are and then where they will be. So instead of storing all of that data, just store where they start and then end and then tween the diff. Isn't that what this "new" ML you just describe does but "different" by slapping "trained ML/AI" to it?
The original JPEG, retroactively named JPEG1, totally lacked loop filters and the quantization factor of DC coefficient matters much more than modern formats. As an example, libjpeg q89 is noticably worse than q90 because the DC quantization factor changes from 3 to 4 (smaller factor means less quantization thus higher quality), quite a big jump.
There's also webp and heif and png and svg, and I believe all the existing formats already solve the image compression problem. The difference of 18kb vs 22kb from hours of microoptimisations is frankly irrelevant given the rate of networks getting faster.
Don't tell that to NFT people who want to put everything on-chain but have to sell their house in gas fees to store a small jpg on Ethereum!
It's like the 70s all over again, but with blockchain instead of floppy disks.
(I've seen a few of them discuss better compression algorithms because they really felt it was extremely useful and meaningful to store the actual data on-chain and not a IPFS link like they usually do)
The simple question: if all the existing formats already solve the image compression problem, why a new image formats (WebP, HEIC, AVIF, JPEG XL etc) appears?
>Q15+Fix vs QL18) is hardly distinguishable to me.
What can I say, it is very distinguishable to me, at least for images that contain larger "homogeneous" areas like skies.
>In essence, the author discovered how to make JPEG images look better by increasing their file size.
This is not what the article is about. E.g. the first image with the large blue sky area, the result was a file size that was halved for Q15+fix compared to the Q50 source, and the Q18 comparison image at the same file size as Q15+fix looks like crap.
So the author got vastly more visual quality for the same or similar file size, while still producing valid jpegs.
It might not matter that much in the grand scheme of things, but it probably matters a lot to the company he is working for, which specializes in image processing, compression and delivery as a service it seems (bandwidth and traffic are cheap, but not free). And it will probably matter at least for some of their customers as well.
It probably won't matter much if you are e.g. on reddit (or are reddit), and that post with that 90kb jpeg (which could have been maybe 50-60 kb with the optimizations mentioned in the article) pulled in 10.1 MB of other crap (wire size) in the 30 seconds the page was open. With an ad-blocker active. Yes, I just ran this very unscientific test.
In the future, other formats like webp (already somewhat widely deployed), avif (browser support is getting there) or jpeg-xl (very promising results per watt compared to avif and sometimes webp, with a nifty lossless jpeg<->jxl mode) - but probably not heif because of the patent situation - might become more dominant, but for the time being a lot of images online and offline will remain jpeg and produced as jpegs.
(png and svg the grandparent poster brought up are for other use cases, btw, and offer lousy to untenable compression for photographs)
It was not badly picked for regular use cases. Just, before the retina displays appeared no one was interested in extreme low bitrate and no one knew that different artifacts had different impact with high density.
> the employed solution only modifies first element from [16,17] to [10,16].
Correction: 16 and 17 are values from the base tables, which means this table is used with q=50. With q=25 it will be [32, 22, 24, 28, 24, 20, 32, 28…] (in zigzag order). The employed solution is to always limit the first value by 10 regardless of q: [10, 22, 24, 28, 24, 20, 32, 28…]
mozjpeg chosen the different approach: it still scales all values based on q, but has significantly changed the default base table. It helps, but doesn't eliminate color banding completely (you can still see it on the example from issues/76).
The tables were good for the time they were picked. And the beginning of the article, it shows that at low resolutions, both ringing and banding are unacceptable. At high resolutions, beyond what was considered normal when the original quantization tables were chosen, then ringing becomes much less of a problem, so it makes sense now to change the quantization tables to prioritize banding over ringing.
It is no secret that high resolution images and low resolution images compress differently, and modern codecs are optimized for high resolutions in a way that older codecs weren't. For example going from H.261 to H.266 globally improve video compression at every step, but it is most apparent at higher resolutions.
> At high resolutions, beyond what was considered normal when the original quantization tables were chosen, then ringing becomes much less of a problem
Slightly more precise: ringing becomes much less of a problem in high resolution images when viewed on a high resolution screen (in Apple language: retina).
The difference is that the kraken 'about technology' website [0] gives no useful information as to how they are doing it (I guess it's 'proprietary information'), while this article gives a very detailed description of how to compress jpegs.
In other words: end users use kraken, developers read this article.
I can take a guess at that "proprietary information". Most image optimization sites are just thin wrappers over something like mozjpeg/zopflipng/etc. 1% tech, 99% marketing
> Ok, but where does this table come from when we need to save a file? It would be a big complication if you had to construct, and transmit 64 independent numbers as a parameter. Instead, most encoders provide a simple interface to set all 64 values simultaneously. This is the well known “quality,” which value could be from 0 to 100. So, we just provide the encoder desired quality and it scales some “base” quantization table. The higher quality, the lower values in quantization table.
I never really thought about how that "quality" slider worked (besides making the compression lossier), but it makes perfect sense now! It always amazes me how much I take for granted.
I always treat compression like a black box: "-crf 23" for H264, PNG and FLAC are nice but MP3 320s and 90+ "quality" JPEGs are good compromises, etc. And that's just for the stuff I deal with, there's no telling how much lossy compression goes on behind the scenes on my own computers, let alone all the stuff served up over the internet. There's so much lossy compression in the world, from EP speed on VHS tapes to websites reencoding uploaded images to every online video ever, it's crazy to think about.
Though its impact may be limited, this is some nice work!
Now could someone look at how video codecs can produce excellent high-detail scenes and motion in 4k resolution while at the same time making a blocky mess out of soft gradients, especially in dark scenes with slow movement?
For a screenshot? Yeah JPEG sucks. For a diagram with large blocks of uniform color and sharp edges? Yeah JPEG sucks.
But for pictures, JPEG is in its comfort zone and it's amazing it is still doing comparatively well after the IT equivalent of 150 years. Only now worthy alternatives are starting to emerge (looking at you JPEG XL), not for lack of trying (looking at you WEBP). It's incredible it managed to stay relevant for so long, and while surely patents, sunken cost fallacies, hardware implementations and inertia played a part, none of this would have mattered if it hadn't been pretty good to start with.
> none of this would have mattered if it hadn't been pretty good to start with.
Considering that MPEG (and competitors) have evolved due to its deficiencies (the current baseline today is H.264 and not the original version H.261 or even its immediate successor MPEG-1), I'm surprised that JPEG is just showing its deficiencies today and not in 2000. Actually, it's a complement that although multiple file formats were invented to handle lossy pictures but even WebP can't beat JPEG all the time (especially that WebP can only save up to 16k pixels per side while JPEG can handle up to 64k pixels).
WebP & JPEG XL compress losslessly better than PNG and lossy much better than JPEG. Perhaps not perfect either, but we finally do have formats that can do both — and better than either before.
Do note that if you need to target macOS 10 + Safari, WebP is not available.
WebP is available for macOS 11, even with overlapping versions of Safari, but Apple relies on the OS image library to render some images and they haven't backported WebP supported when they updated Safari.
Please these kind of comments are entirely counter productive and don't add anything to the conversation. No matter the topic, presume some people are truly interested in the topic and feel free to stay off it.
[1] https://sprzedajemy.pl/www-magazyn-internetowy-2003-nr-04-03... - incidentally this must be the very issue, see the teaser about image optimization