You can see some ringing in the sky around the trees and on the line between the crow's beak/feathers if you look closely. (Alvin's?) fur goes much farther down his forehead as well. People who work deeply with codecs are usually hypersensitive to these sorts of issues that mere mortals like us need to try to see.
There used to be a legendary blog called "Diaries of an x264 developer" by Fiona Glaser [0] where she'd go on long rants about various ways to cheat in encoder comparisons [1], much like this.
Appreciate this, I was feeling the same way as the original comment. It looks maybe over-sharpened, but I don't see anything as glaring as the text of the article makes it sound! (Of course, I'm not a video codec developer.)
It does remind me of how stereo & speaker manufacturers sometimes boost treble a little bit (rather than being perfectly "transparent" to the original signal) because it gives the impression of clarity. But ideally each step in the processing chain "colors" the signal as little as possible, because those little differences can add up.
Yeah, audio response curves have always been a bit confusing to me. Like, they say that headphones should use a Harman curve because that sounds 'best' to listeners, but how valid is it as an objective measure? (E.g., will listeners 50 years from now find a different curve 'better', the same way that instrument tuning has changed over centuries?) And how much of it is responding to current practices in recording and mixing?
Of course, you won't get a sound as if you're in the same room (without a very fancy setup), so you'll generally want some sort of transformation to get an acceptable output. And artists often want to aim for a certain effect on top of that. But with how things currently are, many of the decisions going into the final sound are very opaque.
> Like, they say that headphones should use a Harman curve because that sounds 'best' to listeners, but how valid is it as an objective measure?
It should be valid because it's "neutral". IIRC it's basically a conversion to simulate how a neutrally tuned speaker would sound if you were in the same room.
There are many reasons objective headphone measurements aren't actually objective for you though. The biggest one is that they're taken in a silent room, so a single CPU fan or anything near you makes it invalid. Noise cancelling can mean a lot in practice.
The other reasons are that different people have different ear shapes, some people wear glasses so the headphones can't get a seal, your amp isn't electrically compatible with the headphone, your music is badly mastered so you prefer a headphone badly tuned the opposite way, etc.
> It should be valid because it's "neutral". IIRC it's basically a conversion to simulate how a neutrally tuned speaker would sound if you were in the same room.
Is it, though? Blogspam posts about it waffle over the exact definition, but Olive's original post [0] gives the methodology, "A panel of 10 trained listeners rated each headphone based on overall preferred sound quality, perceived spectral balance, and comfort," and a later Harman post seems to cite the original methodology without comment [1].
Unless the subjective part was just to select between different headphones that had been calibrated to simulate neutral speakers? The posts don't make it entirely clear where the curves originally came from.
The thing that gets me about audio is people obviously have different ears. Some are more sensitive to high frequencies, etc. It's even age-dependent. It's like salt preference on food.
The color shift was the most obvious to me. The other artifacts may not be super visible, but if you care about preserving the original picture, it’s certainly an important quality difference.
In the era of x264, we ( the user or the enthusiast communities ) and x264 developers deeply cared about preserving the original content as much as possible, even the noise and artefacts.
That was in the 00s. It is not the encoder's job to remove or filter out all the details. Background or not. There are some caveat to that but that was comparatively speaking at the time, say RMVB from Real Media or WMV.
Worth remembering it wasn't really the internet era back then. People encode so they could fit more things into CD and DVDs. At least that was how it started.
Somewhere along the line Internet, or mobile internet aka iPhone happened. Now everyone watches on a small screen. With all details washed out, people just want to consume. None of the details mattered. What we would only used to do in AVISyth Filter are now done automatically with Encoder. The 10 min Youtube video doesn't care about any of that. And then the 3 min, now the attention span is basically 30s TikTok or Instagram Reels. Worst of all a lot of these attention to details are also gone when doing Netflix or other long from of movie streaming. VMAF 90 is good enough, lets try to minimise the bit rate as much as possible.
We need higher / best quality at minimal bitrate, instead of having bare minimum / good enough quality at lowest possible bitrate. The two are very different.
While the march of internet / tech giant on video codec means someday we may lose out. Somewhat fortunately we still have a small group of old people in the movie production, broadcast industry, and private torrents release group still cares about these.
Hopefully, someday, especially the west, could move back to celebrate greatness rather than mediocre.
That is true. But why would you replace a simple known to be working filter with this thing that yields worse performance? There are no upsides to this, and that is why the domain expert is baffled. We have been making filters like this for a long time, it is known how they should be evaluated and yet this thing is still touted as state of the art but its creators. If the image quality is lower with this filter and you say that is does not matter then the quality of the stream is to high. This filter is not going to solve that.
It apparently did better with users' subjective evaluations. I guess they liked the "extra detail" look, even if it's fake. End users aren't zooming in to freeze frames or, heaven forbid, taking screenshots (think of the copyright!!)
End users aren't domain experts. That's like if I was allowed on the factory floor where my favorite products are made so that I could make subjective, uninformed decisions about the manufacturing process which also affect everyone else.
Is it? Or is it like if you were allowed into a focus group where they let you try out two manufactured products, so that you could make subjective, uninformed decisions about the product lineup which also affect everyone else?
Yeah, but there are classical algorithms with tunable sharpening that are cheaper and widely known (e.g. catmull-rom or the "magic" kernel sharp algorithm by John Costella).
My suspicion is that none of this mattered though, because the evaluation was probably "perceptual equivalence" vs bitrate. I can easily believe it might be a marginal win over traditional algorithms from that perspective.
It does strike me that video encoding blog posts that show up here are often these kinda toxic rants that seemingly exaggerate whatever it is they're ranting about and also assume the people working on these things are complete morons for missing whatever minute detail the author is angry about.
It does remind me of how stereo & speaker manufacturers sometimes boost treble a little bit (rather than being perfectly "transparent" to the original signal) because it gives the impression of clarity. But ideally each step in the processing chain "colors" the signal as little as possible, because those little differences can add up.