Hacker News new | ask | show | jobs
by jaschasd 1802 days ago
They show the source images next to the super-resolution output on the website for the iterative refinement paper: https://iterative-refinement.github.io/
2 comments

I see only two examples of reference images on that page, but they have quite a few more examples in the paper itself: https://arxiv.org/pdf/2104.07636.pdf

The results compared with reference images pretty clearly shows that the technique isn't "magical" in any sense of the term - meaning that the ML algorithm doesn't rediscover real details that a human could not infer were there from the original.

While the results are certainly incredibly good looking and an advance in the state of the art, if you were hoping to use this as the basis of an image compression codec you will probably be disappointed. Take a look at the woman's hair on page 16 for an excellent example of this. Even at its best the algorithm seems to come up with something plausible but often wrong (exactly as expected), look at small details around eyes for good examples of this.

It's worth comparing the output to some previous results from an algorithm called PULSE (which they show in the paper). The latter is almost always horrifying in some way. As far as I can tell, it frequently makes assumptions that masculinize the subjects or make them look more white / European. SR3 doesn't seem to have this problem.

(Note: to be clear, I do think this is excellent work. Just trying to qualify the results for those in the comments who are mostly interested in fidelity.)

I could be wrong but your expectation seems to involve fantasy - I didn't think anybody expected any algorithm to rediscover the real details. Is that not impossible unless the model was trained on the specific image it's recreating (which would be pointless)?
That's not my expectation. As I said,

> Even at its best the algorithm seems to come up with something plausible but often wrong (exactly as expected)

So the expectation is that the algorithm is only as good, or slightly worse, than what a human would assume the original photograph looked like. SR3 meets that expectation, more or less.

That said, I would argue that there's no reason to assume that humans are equipped with maximally efficient upscaling / content-aware filling algorithms in our brains, either. It might certainly be possible to come up with a ML approach that was able to discover details in the image that were entirely lost on even highly trained human viewers.

I also wanted to point out in my comment that the approach is still pretty flawed in a lot of ways, even compared against the yardstick of human intuition. The woman's hair on page 16 of the paper doesn't pass a plausability test - it just looks like goo.

At the end of the day, what a lot of people (in this thread and elsewhere - see this previous Google research https://hific.github.io/) are interested in is a ML approach that can achieve image compression that is vastly superior to any current approach. That requires being able to recover the real details. My comment answers the question many people probably have - which is that this algorithm doesn't do that (at least with the sample images given).

It might certainly be possible to come up with a ML approach that was able to discover details in the image that were entirely lost on even highly trained human viewers.

This is certainly a fair point but that's a different question from recreating the real details from a compressed image where that data is just gone. There are an infinity of images that you can downsample to say, a 2 pixel by 2 pixel image. You can make an algorithm to produce candidate sources but not distinguish between which was the actual source. The data processing inequality is the formal result covering this. So where you are saying this algorithm doesn't do it, it's moot because it can't be done

Thanks!