| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simias 1093 days ago

Ten years or so ago I was working on a video chip that had an upscaler feature. While prototyping and simulating it, we first started by applying a mathematically-correct (i.e. information preserving) FIR filter to do the upscale. Then we compared the result with other solutions and found that ours looked worse. We asked our colleagues to blind-test it and they all picked third-party-scaled images over ours.

At first we assumed that we must have had a bug somewhere because the Fourrier transform told us that our approach was optimal, but after more testing everything matched the expected output. Yet it looked worse.

So we started reverse-engineering the other solutions and, long story short, what they did better is that they added some form of edge-enhancement to the upscaling. Information-theory-wise it actually degraded the image, but subjectively the sharper outlines were just so much nicer to look at and looked correct-er. You felt like you could more easily tell the details even though, again, in a mathematical sense you actually lost information that way.

I don't think it makes a lot of sense to reduce human vision to edge detection (we can still make sense of a blurry image like this one after all: https://static0.makeuseofimages.com/wordpress/wp-content/upl... ) but it's clear to me from empirical evidence that edge-detection is a core aspect of how we parse visual stimuli.

As such I'm a bit confused as to why the author seems to see this as a binary proposition. That being said, I could just be misunderstanding completely the point the author is trying to make.

10 comments

cameldrv 1093 days ago

I don't think it's just subjective in this case. The theoretical signal processing approach assumes that the signal is band-limited to frequencies less than two pixels wide, and it's not. There are lots of sharp edges that have higher frequency components than that.

Another way of looking it, more along the lines that you're talking about, is that it depends on your error model. The traditional way of measuring error is RMS pointwise in pixels. Doing some sort of interpolation on pixels gives a pretty good result for that. However, another way to look at it is that it may be better to have a positional error, i.e. a particular color or intensity level is in the wrong spot, than to have an intensity/color error, i.e. you have a pixel that has an intensity/color that's not present in the source signal.

This same basic issue was the basis of a big divide in font rendering for many years, where the Mac would render fonts with the exact geometry of the letters, but then anti-aliased, while Windows would use font hinting to make the shape snap to the pixel grid. Personally I thought that the Windows approach was a lot easier to read on a screen, but the Mac approach had the advantage that the geometry of the text would be exactly the same in print as it was on the screen, back when print was something that was important, especially for Mac users.