Hacker News new | ask | show | jobs
by okramcivokram 474 days ago
I don't see (or maybe don't recognize) any issues with the image that they're talking about (ringing, color shift, or fake details). It most certainly doesn't look awful to me, it looks exactly the same, only a bit sharper.
7 comments

You can see some ringing in the sky around the trees and on the line between the crow's beak/feathers if you look closely. (Alvin's?) fur goes much farther down his forehead as well. People who work deeply with codecs are usually hypersensitive to these sorts of issues that mere mortals like us need to try to see.

There used to be a legendary blog called "Diaries of an x264 developer" by Fiona Glaser [0] where she'd go on long rants about various ways to cheat in encoder comparisons [1], much like this.

[0] https://web.archive.org/web/2012/http://x264dev.multimedia.c...

[1] https://web.archive.org/web/20130215095527/http://x264dev.mu...

Appreciate this, I was feeling the same way as the original comment. It looks maybe over-sharpened, but I don't see anything as glaring as the text of the article makes it sound! (Of course, I'm not a video codec developer.)

It does remind me of how stereo & speaker manufacturers sometimes boost treble a little bit (rather than being perfectly "transparent" to the original signal) because it gives the impression of clarity. But ideally each step in the processing chain "colors" the signal as little as possible, because those little differences can add up.

Yeah, audio response curves have always been a bit confusing to me. Like, they say that headphones should use a Harman curve because that sounds 'best' to listeners, but how valid is it as an objective measure? (E.g., will listeners 50 years from now find a different curve 'better', the same way that instrument tuning has changed over centuries?) And how much of it is responding to current practices in recording and mixing?

Of course, you won't get a sound as if you're in the same room (without a very fancy setup), so you'll generally want some sort of transformation to get an acceptable output. And artists often want to aim for a certain effect on top of that. But with how things currently are, many of the decisions going into the final sound are very opaque.

> Like, they say that headphones should use a Harman curve because that sounds 'best' to listeners, but how valid is it as an objective measure?

It should be valid because it's "neutral". IIRC it's basically a conversion to simulate how a neutrally tuned speaker would sound if you were in the same room.

There are many reasons objective headphone measurements aren't actually objective for you though. The biggest one is that they're taken in a silent room, so a single CPU fan or anything near you makes it invalid. Noise cancelling can mean a lot in practice.

The other reasons are that different people have different ear shapes, some people wear glasses so the headphones can't get a seal, your amp isn't electrically compatible with the headphone, your music is badly mastered so you prefer a headphone badly tuned the opposite way, etc.

> It should be valid because it's "neutral". IIRC it's basically a conversion to simulate how a neutrally tuned speaker would sound if you were in the same room.

Is it, though? Blogspam posts about it waffle over the exact definition, but Olive's original post [0] gives the methodology, "A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort," and a later Harman post seems to cite the original methodology without comment [1].

Unless the subjective part was just to select between different headphones that had been calibrated to simulate neutral speakers? The posts don't make it entirely clear where the curves originally came from.

[0] https://seanolive.blogspot.com/2013/04/the-relationship-betw...

[1] https://pro.harman.com/insights/akg/defining-the-standard-th...

The thing that gets me about audio is people obviously have different ears. Some are more sensitive to high frequencies, etc. It's even age-dependent. It's like salt preference on food.
The color shift was the most obvious to me. The other artifacts may not be super visible, but if you care about preserving the original picture, it’s certainly an important quality difference.
In the era of x264, we ( the user or the enthusiast communities ) and x264 developers deeply cared about preserving the original content as much as possible, even the noise and artefacts.

That was in the 00s. It is not the encoder's job to remove or filter out all the details. Background or not. There are some caveat to that but that was comparatively speaking at the time, say RMVB from Real Media or WMV.

Worth remembering it wasn't really the internet era back then. People encode so they could fit more things into CD and DVDs. At least that was how it started.

Somewhere along the line Internet, or mobile internet aka iPhone happened. Now everyone watches on a small screen. With all details washed out, people just want to consume. None of the details mattered. What we would only used to do in AVISyth Filter are now done automatically with Encoder. The 10 min Youtube video doesn't care about any of that. And then the 3 min, now the attention span is basically 30s TikTok or Instagram Reels. Worst of all a lot of these attention to details are also gone when doing Netflix or other long from of movie streaming. VMAF 90 is good enough, lets try to minimise the bit rate as much as possible.

We need higher / best quality at minimal bitrate, instead of having bare minimum / good enough quality at lowest possible bitrate. The two are very different.

While the march of internet / tech giant on video codec means someday we may lose out. Somewhat fortunately we still have a small group of old people in the movie production, broadcast industry, and private torrents release group still cares about these.

Hopefully, someday, especially the west, could move back to celebrate greatness rather than mediocre.

I agree the "negative" artifacts are almost impossible to see, and came here to the comments to see what the heck the author was talking about.

> People who work deeply with codecs are usually hypersensitive to these sorts of issues that mere mortals like us need to try to see.

I think that kind of shows that the author is unfairly critical.

They're saying "this should not have shipped", when it seems just fine to us "mere mortals".

Yes, video encoding requires using your eyes. But it also seems like it should use normal eyes, not hypersensitive eyes...?

Also video is viewed in motion, not as static frames. And end-users watching on low bitrates aren't going to freeze-frame and zoom in.
That is true. But why would you replace a simple known to be working filter with this thing that yields worse performance? There are no upsides to this, and that is why the domain expert is baffled. We have been making filters like this for a long time, it is known how they should be evaluated and yet this thing is still touted as state of the art but its creators. If the image quality is lower with this filter and you say that is does not matter then the quality of the stream is to high. This filter is not going to solve that.
It apparently did better with users' subjective evaluations. I guess they liked the "extra detail" look, even if it's fake. End users aren't zooming in to freeze frames or, heaven forbid, taking screenshots (think of the copyright!!)
End users aren't domain experts. That's like if I was allowed on the factory floor where my favorite products are made so that I could make subjective, uninformed decisions about the manufacturing process which also affect everyone else.
Yeah, but there are classical algorithms with tunable sharpening that are cheaper and widely known (e.g. catmull-rom or the "magic" kernel sharp algorithm by John Costella).

My suspicion is that none of this mattered though, because the evaluation was probably "perceptual equivalence" vs bitrate. I can easily believe it might be a marginal win over traditional algorithms from that perspective.

It does strike me that video encoding blog posts that show up here are often these kinda toxic rants that seemingly exaggerate whatever it is they're ranting about and also assume the people working on these things are complete morons for missing whatever minute detail the author is angry about.
It’s mostly only visible in the closeup of the kid. There are hairs that have been unnecessarily accentuated, and his eye and eyebrow outline look hyper-sharpened, with rough edges.

The non-zoomed image looks fine to me, and I (to some extent) know what I’m looking for. Some private torrent trackers that pride themselves on having transparent encodes will look for this kind of stuff; you have to do multiple test encodes tweaking various parameters to ffmpeg, agonizing over A/B screencaps, only to inevitably be told you either missed some minuscule detail in a single scene, or that your encode is bloated.

As someone who works a lot with sound I notice a ton of artifacts most people don't recognize. Often those artifacts are harder to hear on cheap speakers but become far more obvious with a good setup. But they also add up and while untrained ears can't hear specific examples, they do result in an overall worse experience which is especially frustrating for high end users when they paid a lot for nice speakers and it's just revealing the grunge that was always there.

That's all to say - I also could not find all the things they are talking about, probably a combination of not being trained, not working a lot with video codecs, and not having the best monitors - but I get the authors frustration, and I'm glad there's people who care about these things! But yeah, I hunted for that color shift and just not seeing it...

I remember a while ago we have a thread on HN discussing this. 50% of the world cant taste the difference between Pepsi and Coke Cola. People cant tell the difference between 128Kbps or 256Kbps MP3. etc.

For the people who are sensitive to a lot of these, it is more of a curse than gift. Some cant taste the difference between Corn Fed ( or Finished ) and Grass Fed Beef. The colour shift in this article, or how the latest TV perform between OLED, QD-OLED, Four Layer OLED, Mini-LED with different brand.

It turns out being able to "compare" is a skill set in itself. I would assume comparing is also a function that requires more brain power / energy, and most people's natural state would be to conserve that energy.

I have been thinking about this for quite some time. Most people dont know how to compare, or what to compare it to. And precisely because most people dont know how to compare or how not to compare, we need marketing. And I think most successful founder are very good at comparing things. Steve Jobs would be a prime example.

The ringing is most obvious in the striped shirt in the painting on the wall. It's added entire new stripes that don't exist in the original.
Ringing is pretty obvious to me. It has a specific meaning in this context, it means edges are over-sharpened to the point that "fake" extra edges appear.

https://en.wikipedia.org/wiki/Ringing_artifacts

That is what it should mean, but in image compression people use it to mean "mosquito noise" artifacts, which come from quantizing DCT compression applied to edges. (Nyquist theorem = DCT is bad at edges because they're made of an infinite number of frequencies.)
That is the exact same phenomenon. Artificial sharpening is introducing high bandwidth components to a signal. If you bandwidth limit (low pass) a signal to fit below the Shannon-Nyquist limit you will get ringing as the signal cannot be represented accurately and will smear in the time domain. Given a bandwidth constraint, artificial sharpening above a certain threshold will result in ringing.
Images don't have infinite bandwidth, so that doesn't apply. The filter used in H264 and newer codecs is exact and nearly reversible, there aren't artifacts from applying it. The artifacts come from the rounding afterward.
It's the exact same phenomenon in both. Not sure where you're making the distinction.
It doesn't come from "oversharpening".
Sharpening and bandwidth-limiting have the exact same effect, because the maximum sharpness of an image (like any other signal) depends on its bandwidth. There is no difference in the type of artifact produced. That's why the artifact from both has the same name of "ringing".
It doesn't come from "sharpening" at all though. That implies an increase in frequencies. This compression artifact comes from rounding them, so they're moved in both directions.
I don't immediately see the issue either.

Though, even if I could, this is a new way to preprocess an image before feeding it into an encoder, and the examples have both been fed through the new downsizer, then the standard encoder, presumably at the standard Netflix bitrates and then (I think) upscaled back to the original size.

So if this didn't look a little compressed then that would be a methodological mistake, as you don't use downscaled encoding unless you've already decided that a full size encode has too much quality for your task.

And Netflix generally has incentives set up to reduce quality until their customers notice. That's why they quote stats based on that.

In web dev terms it's like reducing the size of your product images until it hurts sales. You're almost guaranteed to have artifacts visible to image compression experts before you hit the point that it affects your bottom line. If you are targeting customers on slow internet (and again if you are downscaling then you basically are) your sales are likely to initially rise as you get usable pictures to people faster.

the fact that it looks different is the problem.