Hacker News new | ask | show | jobs
by f- 2256 days ago
As a photographer, the comparison to "raw" results without color balance or noise removal seems somewhat deceptive. The effects visible in the video seem easy to quickly replicate with existing techniques, such as the "surface blur" filter that averages out pixel values in areas with similar color.

This happens at the expense of detail in low-contrast areas, producing a plastic-like appearance of human skin and hair, and making low-contrast text unintelligible, which is why it's generally not done by default.

7 comments

The comparison is fair because it tries to automate expertise.

I'm sure you know exactly how much of which filter to apply for similar results. Laymen like ourselves will need a lot more trial and error. Their contribution here is to provide a push-button, automated mechanism.

I would have probably also tried something simple and given up due to the noise. So this is definitely interesting.

This is completely untrue.

What you are describing is usually called automatic tone mapping. This is basically noise reduction and possibly color normalization from brightening a dark image. Them showing their black image as the starting point is silly, because jpg will make a mess of the remaining information. What they should show is the raw image brightened by a straight multiplier to show the noisy version that you would get from trying to increase brightness in a trivial way.

What jpg? they are using raw data.
Image on Github is JPEG made from RAW. Since RAW file has more dynamic range and contains a lot more information than JPEG you can take that photo in an editor and crank up the brightness. You will get a noisy image but it will be a lot brighter and will probably resemble the image with the high ISO in the middle. Then in an editor you can apply some de-noiser to get results similar to the last one.

So presumably this neural net more or less does it for you.

The *PNG is there just to show the results produced by the CNN, if you watch the linked video they do exactly what you are suggesting and then compare both results.
Their example on their github page uses a jpg that makes it look like they are creating something from nothing.
To me the results seem vastly superior to those sort of simple DSP algorithms. The video shows a comparison with some denoising: https://youtu.be/qWKUFK7MWvg?t=102
Your example strikes me as the kind of thing neural networks are much better at than a fixed filter. You or I could easily identify regions of an image where it's safe vs unsafe to do the surface averaging, and boundaries where we wouldn't want to mix up the averages. (For example, averaging text should be fine, so long as you don't cross the text boundaries.) A CNN should also be able to learn to do this pretty easily.
What you are describing is a class of filters known as edge preserving filters. You can look at bilateral filters and guided filters for examples that have been around for decades at this point.
So we can do a decent job with hand designed filters... Why aren't they in use in the problem the parent describes? Are they not good enough to deal with small text boundaries?

A lot of hand built filters (I see a lot of these in the audio space) have many hand tuned parameters, which work well in certain circumstances, and less well in other circumstances. One of the big advantages of NN systems is the ability to adapt to context more dynamically. The NN filters can generally emulate the hand designed system, and pick out weightings appropriate to the example.

This is effectively noise reduction, which bilateral and guided filters are actually used for. They take the weights of their kernels based on local pixels and statistics. You can also look up other edge preserving filters like BM3D and non-local means.

I don't know what you mean by hand made filters and I don't know why that's a conclusion you jumped to.

>As a photographer, the comparison to "raw" results without color balance or noise removal seems somewhat deceptive.

Huh? At 1:40 in the video that's exactly what they do.

Interestingly, this effect is notably visible in their example image [0]. Notice the distinctly "plasticized" appearance of the book cover, and how the text is not intelligible in the low-contrast areas of the reflection.

[0]: https://raw.githubusercontent.com/cchen156/Learning-to-See-i...

Note (a) and (b) are separate photographs (different angles and everything), and that (c) is based on (a), not (b); comparing the glare between (b) and (c) isn't quite an even comparison.
Oh gosh. Taking one second to think about it, _of course_ (a) and (b) are separate photographs -- that is the entire point of that diagram. Somehow my brain farted right over that when making my previous comment.

Thank you, not only for setting me straight, but also for doing so as kindly as you did.

It would indeed be interesting to see a comparison with for instance non-local means on the scaled raw image. The speed is superior in any case, I suspect.
i always have this complaint too. its fundamentally a lossy process, in the hand wavy sense. its more "impressive" looking, but actually conveying less real detail.