Hacker News new | ask | show | jobs
by atorodius 1047 days ago
yes, so infinite training data. but the challenge will be scaling to large resolutions and getting global consistency
2 comments

Is that challenging? Humans have awful color resolution perception, so even if you have a huge black-and-white image, people would think it looks right with even with very low-resolution color information. Or, if the AI hallucinates a lot of high frequency color noise, it wouldn't be noticable.

Wikipedia has a great example image here: https://en.wikipedia.org/wiki/Chroma_subsampling. Most people would say all of them looked fine at 1:1 resolution.

I meant more from a comoute standpoint, the models are expensive to run full res
I see what you mean. I think that you can happily scale the B&W image down, run the model, and then scale the chroma information back up.

Something I was thinking about after writing the comment is that the model is probably trained on chroma-subsampled images. Digital cameras do it with the bayer filter, and video cameras add 4:2:0 subsampling or similar subsampling as they compress the image. So the AI is probably biased towards "look like this photo was taken with a digital camera" versus "actually reconstruct the colors of the image". What effect this actually has, I don't know!

good point, I hadn’t realized that you only need to predict chroma! That actully greatly simplifies things

re. chroma subsampling in training data: this is actually a big problem and a good generative model will absolutely learn to predict chroma subsampled values (or JPEG artifacts even!). you can get around it by applying random downscaling with antialiasing during training.

I guess you can always use a two-stage process. First colorize, then upscale
yeah, you can use SOTA super res, but that tends to be generative too (even diffusion based on its own, or more commonly based on GANs). it can be a challenge to synthesize the right high res details.

but that’s basically the stable diffusion paper (diffusion in latent space plus GAN superres)

Yeah, if you have a high res image, you can get color info at super low-res and then regenerate the colors at high res with another model. (though this isn't an efficient approach at all)

https://github.com/TencentARC/T2I-Adapter

i've also seen a controlnet do this.