Hacker News new | ask | show | jobs
by mfabbri77 677 days ago
I am quite convinced that if the goal is the best possible output quality, then the best approach is to analytically compute the non-overlapping areas of each polygon within each pixel. Resolving all contributions (areas) together in the same single pass for each pixel.
4 comments

Why are you convinced of this, and can I help unconvince you? ;) What you describe is what’s called “Box Filtering” in the article. Box filtering is well studied, and it is known to not be the best possible output quality. The reason this is not the best approach is because a pixel is not a little square, a pixel is a sample of a signal, and it has to be approached with signal processing and human perception in mind. (See the famous paper linked in the article: A Pixel is Not a Little Square, A Pixel is Not a Little Square, A Pixel is Not a Little Square http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf)

It can be surprising at first, but when you analytically compute the area of non-overlapping parts of a pixel (i.e., use Box Filtering) you can introduce high frequencies that cause visible aliasing artifacts that will never go away. This is also true if you are using sub-sampling of a pixel, taking point samples and averaging them, no matter how many samples you take.

You can see the aliasing I’m talking about in the example at the top of the article, the 3rd one is the Box Filter - equivalent to computing the area of the polygons within each pixel. Look closely near the center of the circle where all the lines converge, and you can see little artifacts above and below, and to the left and right of the center, artifacts that are not there in the “Bilinear Filter” example on the right.

I think the story is a lot more complicated. Talking about "the best possible output quality" is a big claim, and I have no reason to believe it can be achieved by mathematically simple techniques (ie linear convolution with a kernel). Quality is ultimately a function of human perception, which is complex and poorly understood, and optimizing for that is similarly not going to be easy.

The Mitchell-Netravali paper[1] correctly describes sampling as a tradeoff space. If you optimize for frequency response (brick wall rejection of aliasing) the impulse response is sinc and you get a lot of ringing. If you optimize for total rejection of aliasing while maintaining positive support, you get something that looks like a Gaussian impulse response, which is very smooth but blurry. And if you optimize for small spatial support and lack of ringing, you get a box filter, which lets some aliasing through.

Which is best, I think, depends on what you're filtering. For natural scenes, you can make an argument that the oblique projection approach of Rocha et al[2] is the optimal point in the tradeoff space. I tried it on text, though, and there were noticeable ringing artifacts; box filtering is definitely better quality to my eyes.

I like to think about antialiasing specific test images. The Siemens star is very sensitive in showing aliasing, but it also makes sense to look at a half-plane and a thin line, as they're more accurate models of real 2D scenes that people care about. It's hard to imagine doing better than a box filter for a half-plane; either you get ringing (which has the additional negative impact of clipping when the half-planes are at the gamut boundary of the display; not something you have to worry about with natural images) or blurriness. In particular, a tent filter is going to be softer but your eye won't pick up the reduction in aliasing, though it is certainly present in the frequency domain.

A thin line is a different story. With a box filter, you get basically a non antialiased line of single pixel thickness, just less alpha, and it's clearly possible to do better; a tent filter is going to look better.

But a thin line is just a linear combination of two half-planes. So if you accept that a box filter is better visual quality than a tent filter for a half-plane, and the other way around for a thin line, then the conclusion is that linear filtering is not the correct path to truly highest quality.

With the exception of thin lines, for most 2D scenes a box filter with antialiasing done in the correct color space is very close to the best quality - maybe the midwit meme applies, and it does make sense to model a pixel as a little square in that case. But I am interested in the question of how to truly achieve the best quality, and I don't think we really know the answer yet.

[1] https://www.cs.utexas.edu/~fussell/courses/cs384g-fall2013/l...

[2] https://www.inf.ufrgs.br/~eslgastal/SBS3/Rocha_Oliveira_Gast...

In my opinion, if you break down all the polygons in your scene into non-overlapping polygons, then clip them into pixels, calculate the color of each piece of polygon (applying all paints, blend modes, etc) and sum it up, ...in the end that's the best visual quality you can get. And that's the idea i'm working on, but it involves the decomposition/clip step on the CPU, while sum of paint/blend is done by the GPU.
That isn’t true. Again, please look more closely at the first example in the article, and take the time to understand it. It demonstrates there’s a better method than what you’re suggesting, proving that clipping to pixels and summing the area is not the best visual quality you can get.
As pointed out by Raphlinus, the moire pattern in the Siemens star isn't such a significant quality indicator for the type of content usually encountered in 2D vector graphics. With the analytical coverage calculation you can have perfect font/text rendering, perfect thin lines/shapes and, by solving all the areas at once, no conflating artifacts.
Raph made an argument that Box is good enough for lots of things, which is subjective and depends entirely on what things you’re doing, and how much you actually care about quality.

You are claiming it’s the best possible. Box filter is simply not the best possible, and this fact is well understood and documented.

You can relax your claim to say it’s good enough for what you need, and I won’t disagree with you anymore. Personally, I’m sensitive to visible pixelation, and the Box Filter will always result in some visible pixelation with all 2D vector graphics, so if you really care about high quality rendering, I’m very skeptical that you really want Box filtering as the ideal target. Box filter is a compromise, it’s easier & faster to compute. But it’s not the highest quality. It would be good to understand why that’s the case.

* Edit to further clarify and respond to this:

> With the analytical coverage calculation you can have perfect font/text rendering, perfect thin lines/shapes and, by solving all the areas at once, no conflating artifacts.

You cannot get perfect font or text rendering with a Box filter, and you will get some conflating artifacts. They might be very slight, and not bothersome to most people, but they do exist with a Box filter, always. This is a mathematical property of Box filtering, not a subjective claim.

I don't see how you can support the claim of perfect thin line rendering, it's visibly just not very good. So box filtering logically can't possibly be the best possible quality.

Can we make a magical adaptive filter which resembles box filter for half-planes, a tent filter for thin lines, Mitchell-Netravali or oblique projection for natural images, and Gaussian when filtering images for which high frequency detail is not important? Perhaps, but that feels like advanced research, and also computationally expensive. I don't think you can claim "perfect" without backing it up with human factors data really demonstrating that the filtered images are optimum with respect to perceived quality.

Backing up to your earlier comment. Pixels on some displays are in fact little squares of uniform color. The question then is how to color a pixel given geometry with detail within that square.

All of this "filtering" is variations on adding blur. In fact the article extends the technique to deliberately blur images on a larger scale. When we integrate a function (which could be a color gradient over a fully filled polygon) and then paint the little square with a solid "average" color that's also a form of blurring (more like distorting in this case) the detail.

It is notable that the examples given are moving, which means moire patterns and other artifacts will have frame-to-frame effects that may be annoying visually. Simply blurring the image takes care of that at the expense of eliminating what looks like detail but may not actually be meaningful. Some of the less blurry images seem to have radial lines that bend and go back out in another location for example, so I'd call that false detail. It may actually be better to blur such detail instead of leaving it look sharper with false contours.

Yes it’s a good point that LCD pixels are more square than the CRTs that were ubiquitous when Alvy Ray wrote his paper. I think I even made that point before on HN somewhere. I did mention in response to Raph that yes the ideal target depends on what the display is, and the filter choice does depend on whether it’s LCD, CRT, film, print, or something else. That said, LCD pixels are not perfect little squares, and they’re almost never uniform color. The ideal filter for LCDs might be kinda complicated, and you’d probably need three RGB-separated filters.

Conceptually, what we’re doing is low-pass filtering, rather than blurring, so I wouldn’t necessarily call filtering just “adding blur”, but in some sense those two ideas are very close to each other, so I wouldn’t call it wrong either. :P The render filtering is a convolution integral, and is slightly different than adding blur to an image without taking the pixel shape into account. Here the filter’s quality depends on taking the pixel shape into account.

You’re right about making note of the animated examples - this is because it’s easier to demonstrate aliasing when animated. The ‘false detail’ is also aliasing, and does arise because the filtering didn’t adequately filter out high frequencies, so they’ve been sampled incorrectly and lead to incorrect image reconstruction. I totally agree that if you get such aliasing false detail, it’s preferable to err (slightly) on the side of blurry, rather than sharp and wrong.

I don’t know of any display technology in which pixels are little squares, if you really get out the magnifying glass.
Oh I agree with all of that. And nice to see you on HN Raph - was nice to meet you at HPG the other day.

It’s subjective, so box filter being ‘close’ is a somewhat accurate statement. I’m coming from the film world, and so I have a pretty hard time agreeing that it’s “very” close. Box filter breaks often and easily, especially under animation, but it’s certainly better than nearest neighbor sampling, if that’s our baseline. Box filter is pretty bad for nearly any scenario where there are frequencies higher than the pixel spacing, which includes textures, patterns, thin lines, and all kinds of things, and the real world is full of these box-filter-confounding features.

One interesting question to ask is whether you the viewer can reliably identify the size of a pixel anywhere in the image. If you can see any stepping of any kind, the pixel size is visible, and that means the filter is inadequate and cannot achieve “best possible output quality”. Most people are not sensitive to this at all, but I’ve sat through many filter evaluation sessions with film directors and lighting/vfx supervisors who are insanely sensitive to the differences between well tuned and closely matching Mitchell and Gaussian filters, for example. Personally, for various reasons based on past experience, I think it’s better to err slightly on the side of too blurry than too sharp. I’d rather use a Gaussian than bicubic, but the film people don’t necessarily agree and they think Gaussian is too blurry once you eliminate aliasing. Once you find the sharpest Gaussian you can that doesn’t alias, you will not be able to identify the size of a pixel - image features transition from sharp to blurry as you consider smaller scales, but pixel boundaries are not visible. I’ve never personally seen another filter that does this always, even under contrived scenarios.

That said, I still think it’s tautologically true that box filter is simply not the “best” quality, even if we’re talking about very minor differences. Bilinear and Bicubic are always as good or better, even when the lay person can’t see the differences (or when they don’t know what to look for).

My opinion is that there is no such thing as “best” output quality. We are in a tradeoff space, and the optimal result depends on goals that need to be stated explicitly and elaborated carefully. It depends heavily on the specific display, who/what is looking at the display, what the viewer cares about, what the surrounding environment is like, etc., etc..

* edit just to add that even though I don’t think “best” visual quality exists, I do think box filter can never get there, the contention for top spot is between the higher order filters, and box filter isn’t even in the running. I had meant to mention that even a single 2d plane that black on one side and white on the other, when rendered with box filter, yields an edge in which you can identify visible stepping. If you handle gamma & color properly, you can minimize it, but you can still see the pixels, even in this simplest of all cases. For me, that’s one reason box filter is disqualified from any discussion of high quality rendering.

If there's one thing I've learned from image processing it's that the idea of a pixel as a perfect square is somewhat overrated.

Anti-aliasing is exactly as it sounds, a low-pass filter to prevent artefacts. Convolution with a square pulse is serviceable, but is not actually that good a low-pass filter, you get all kinds of moire effects. This is why a Bicubic kernel that kind of mimics a perfect low-pass filter (which would be a sinc kernel), can perform better.

It is tempting to use a square kernel though, because it's pretty much the sharpest possible method of acceptable quality.

I've been looking into how viable this is as a performant strategy. If you have non-overlapping areas, then contributions to a single pixel can be made independently (since it is just the sum of contributions). The usual approach (computing coverage and blending into the color) is more constrained, where the operations need to be done in back-to-front order.
I've been researching this field for 20 years (I'm one of the developers of AmanithVG). Unfortunately, no matter how fast they are made, all the algorithms to analytically decompose areas involve a step to find intersections and therefore sweepline approaches that are difficult to parallelize and therefore must be done in CPU. However, we are working on it for the next AmanithVG rasterizer, so I'm keeping my eyes open for all possible alternatives.
I ran across https://dl.acm.org/doi/pdf/10.1145/72935.72950 a few weeks ago, it seems like a potential non-sweepline highly-parallel method. I've had some promising results for first doing a higher-dimensional Hilbert-sort (giving spatial locality), and then being able to prune a very large percentage of the quadratic search space. It might still be too slow on the GPU. I'm curious if you have any write-ups on things that have been explored, or if I'd be able to pick your brain some time!
I believe Vello does this for AA (though I can't find the source now), and it's very fast, running on the GPU via compute shaders.
No, Vello does not analytically find intersections. Compositing is (currently) done by alpha blending, which is consistent with the W3C spec but has its own tradeoffs.
> compute the non-overlapping areas of each polygon within each pixel

In the given example (periodic checkerboard), that would be impossible because the pixels that touch the horizon intersect an infinite amount of polygons.

Not that TFA solves that problem either. As far as I know the exact rendering of a periodic pattern in perspective is an open problem.