Hacker News new | ask | show | jobs
by Leynos 3526 days ago
A friend of mine suggested that an approach similar to this could be used to upscale old standard definition TV shows (specifically, those shot on video rather than film). I'd imagine that multiple specially trained networks would be employed for different parts of the image (trained on pictures of individual performers or types of set/background). Pleased to see that this is possible. Is there anyone doing something along those lines already?
4 comments

It should also be possible to train it on itself to improve moving scenes by using the motion itself as temporal super-sampling, just like the human eye does.
this works quite well, and does not necessarily require any NN/machine learning. see the youtube for this paper https://www.disneyresearch.com/publication/scenespace/ tldr simple brute force weighted average of samples from many frames, combined with a noisy/low quality depth-from-motion estimate can be used to de-noise, increase resolution and otherwise manipulate video footage. very cool paper with great results from a simple technique.
Ooh
As you suggested, continuity of appearance is what makes this problem so difficult.

I recall watching a movie that was converted from black-and-white to color as a child. There were many distracting artifacts. Most notable was the hairlines of the actors would shift as the actor rotated their head. It made the film unwatchable.

(Author here.) Absolutely! Using multiple super-resolution networks, not only continuity would present problems, but also blending between different regions. I agree there's a lot of value for domain-specific networks here, as you can see from the faces example on GitHub.

I'd be curious to see an ensemble-based super-resolution, where each model can output the confidence of a pixel region, then have another network learn to blend the result.

Conversely, these results are achieved using a single top-of-range GPU. Everything fits in memory for a batch-size 15 at 192x192. By distributing the training somehow, you could make the network 10x bigger and train for a whole week and likely get much better general purpose results.

Is there anyone doing something along those lines already?

I have a side business doing Film restoration and am not aware of any solution like that. Probably the best upscale solution there is is from Teranex, acquired by BlackMagic Design. Evertz probably also has something in their offering.

It should work, I don't think you need to bother with training it on individual performers. Someone made a thing like this to improve low res anime, that worked well.

In theory you could use this to increase temporal resolution as well. Turn 24 fps movies into 60 fps, and upscale regular HD to 4k.