Hacker News new | ask | show | jobs
Deep Bilateral Learning for Real-Time Image Enhancement (groups.csail.mit.edu)
142 points by vadimbaryshev 3239 days ago
8 comments

http://halide-lang.org/ is pretty good at optimising image filters for realtime use on mobile devices.

What neural networks are really good at, is if feature engineering the transform is difficult or time consuming. Like upscaling resolution (SRGAN) - or increasing dynamic range of LDR images by training with LDR-HDR pairs would be another nice use case. Neural nets for processing 1080p+ images have too many parameters to run well on mobile devices, but looks like this research gets around that (for some use cases).

Will have to play with the repo!

Film emulation (beyond the usual 3D LUTs for colour matching film stock) would be a fun use case. Wonder how much training data is required

Film emulation sounds like a special case of style transfer. Those run from a single image, so it might be reasonable to emulate it with very little data.
I think accurate film emulation would require a fair amount of training material pairs (digital/film) to learn the transformation between colours, colour/scene dependent dynamic range compression, and other artefacts like local contrast. The paper mentions using 4000 training pairs for their HDR+ example
They don't process the whole 1080p image, they down sample it to 256x256.
Buried lead is the awesome demo - https://youtu.be/GAe0qKKQY_I?t=130
I really would like to see them try different learning sets that vary the "styles" of retouching. This example looks like it's strongly biased to the "make the images pop!" style of retouching, blowing highlights, shadows and contrasts.

What if the input set has more subtle retouching that pulls highlights and pushes shadows, but without the aforementioned issues?

What if they got their hands on the unedited and edited magnum photos? That would produce an interesting B&W filter, for sure!

https://www.slrlounge.com/magnum-photos-darkroom-magic-genes...

I wonder how many images are required to train a network like this?

If it's in the millions, getting pre and post retouching image pairs in such a quantity is likely impractical.

Ah, right, that could also be a limitation.

Although I'm pretty sure Magnum Photos has a large quantity of images, but perhaps not all in a consistent style.

What exactly is awesome about it? What are they actually achieving that is impressive.

This is basically doing something at low resolution and applying the transformation to the high resolution image using a bilateral filter to make the interpolation respect edges. There isn't really anything new here except for the combination of buzzwords in the title.

I think this paper is showing that you "can" train an auto exposure/white balancing/edit flow algorithm with a DL pipeline, but the results do not necessarily mean it will outperform simple and cheaper auto exposure/white balancing algorithms that's out there. And the flexibility in this approach also allows masking and background removal.

However, most of the examples in the paper in fact shows improvements of exposure and color. If you import those images and tweak 3 or 4 adjustments of clarity, curves, exposure, saturation in Polarr or Lightroom, you will quickly get very close to the result produced by this paper. However, it is still impressive that it could get to an exposure histogram that looks exact like the ground truth.

Maybe someone can benchmark this against the Google photos auto enhance. A lot of people turn the auto-enhance in Google off because it sometimes create unnatural looks for photos, which are tolerable to everyday consumer but for pros it just looks bad.

Lastly, if you look very closely on the input images, some of them appears to be artificially adjusted to show how the model works. (last page, 4th row, fist image, which looks both underexposured and overexposured after damping brightness through post processing), and these input images are not always the type of images you can get from cameras.

Link to github repo is 404'ing (https://github.com/google/hdrnet)
They have added a tool tip. Now, it is saying coming this week
"Enhancement" meaning tone-mapping? Are neural networks really required for that? Seems like a lot of heavy machinery for the resulting filter, but maybe tone-mapping standards have gone up.
They imply "human operator" level retouching, so potentially some combination of tone mapping, unsharp mask, edge enhancement, etc. as a single NN operation. It's also <30ms for 1080p on mobile, so potentially better than average speed.
This is effectively "tone mapping" which is aware of context. Ie. a face and a shoe might have the exact same color, yet they can end up different colours after processing even if they appear in the same image.
this is the most complicated histo-stretch I've ever seen
Very interesting approach. And it is always great to see teams provide actual pre-trained models. The less work people have to put in to reproduce your claims, the more likely you are to be taken seriously.

The code unfortunately returns a 404 for now. Hopefully, that is fixed soon.

I find the examples of face brightening to detract from my impression of the entire work. Those images look so awkward, and are such poor photography that I'm not sure why someone would want to emulate them.