| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by erwannmillon 1050 days ago

basically the training works as follows: Take a color image in RGB. Convert it to LAB. This is an alternative color space where the first channel is a greyscale image, and two channels that represent the color information.

In a traditional pixel-space (non latent) diffusion model, you noise all the RGB channels and train a Unet to predict the noise at a given timestep.

When colorizing an image, the Unet always "knows" the black and white image (i.e the L channel).

This implementation only adds noise to the color channels, while keeping the L channel constant.

So to train the model, you need a dataset of colored images. They would be converted to LAB, and the color channels would be noised.

You can't train on decolorized images, because the neural network needs to learn how to predict color with a black and white image as context. Without color info, the model can't learn.

4 comments

bemusedthrow75 1050 days ago

But since you do not have access to colour originals of historical photos in almost every instance, you cannot possibly train the network to have any instinct for the colour sensitivity of the medium, can you?

An extreme example:

https://www.cabinetmagazine.org/issues/51/archibald.php

https://www.messynessychic.com/2016/05/05/max-factors-clown-...

Colourising old TV footage can only result in a misrepresentation, because the underlying colour is false to have any kind of usable representation on the medium itself.

And this caricatured example underpins the problem with colourisation: contemporary bias is unavoidable, and can be misleading. Can you take a black and white photo of an African-American woman in the 1930s and accurately colour her skin?

You cannot.

dragonwriter 1050 days ago

> Can you take a black and white photo of an African-American woman in the 1930s and accurately colour her skin?

AI colorization will, in general, be plausible, not accurate.

erwannmillon 1050 days ago

Yeah, the model is racist for sure. That's a limitation of the dataset though (celeb A is not known for its diversity, but it was easy for me to work with, I trained this model on Colab)

And plausibility is a feauture, not a bug.

There are always many plausibily correct colorizations of an image, which you want the model to be able to capture in order to be versatile.

Many colorization models introduce additional losses (such as discriminator losses) that avoid constraining the model to a single "correct answer" when the solution space is actually considerably larger.

morelisp 1050 days ago

In other words, bullshit.

dragonwriter 1050 days ago

No more so than any other colorization method that isn’t dependent on out-of-band info about the particular image (and even that is just more constrained informed guesswork.)

That's what happens when you are filling in missing info that isn't in your source.

EDIT: Of course, color photography can be “bullshit” rather than accurate in relation to the actual colors of things in the image; as is the case with the red, blue, and green (actual colors of the physical items) uniforms in Star Trek: The Original Series. But, also fairly frequently, lots of not-intentionally-distortive reproductions of skin tones (often most politically sensitive in the US with racially non-White subjects, where there are also plenty of examples of deliberate manipulation.)

morelisp 1050 days ago

Showing color X on TVs by actually making the thing color Y in the studio, well, filming, not bullshit. It's an intentional choice playing out as intended. It is meant to communicate a particular thing and does so.

dragonwriter 1050 days ago

That particular thing was not intentional, and is the reason why the (same color in person, different material) command wrap uniform that is supposed to be color-matched to the made-as-green uniforms isn’t on screen.

But, yes, in general inaccurate color reproduction can be intentionally manipulated with planning to intentionally create appearances in photos that do not exist in reality.

jackpeterfletch 1050 days ago

shrug people like looking at colorised photos because it helps root the image within the setting of the real world they occupy.

For some it’s more evocative, irregardless of the absolute accuracy.

Having a professional do it for that picture of your great grandad is expensive.

Having a colourisation subreddit do it is probably worse for accuracy.

I think there is a place for this bullshit.

snvzz 1050 days ago

The original color information just isn't there.

So bullshit is the best you're going to get.

morelisp 1050 days ago

Well, you could also not put more bullshit in the world by not doing the thing.

wruza 1050 days ago

Why are you so negative about it? Pretty sure many people would find it impressive to colorize old photos to look at them as if these were taken in color.

Should artists not put their bs in the world? Writers? Musicians? Most of it is made up but plausible to make you feel something subjective.

roywiggins 1050 days ago

People have been colorizing photos as long as there have been photos.

atorodius 1050 days ago

This is true, but if you have some reference images, you can probably adapt some of the recent diffusion adaptation work such as DreamBooth, to tell the model „hey this period looked like this“, and finetune it.

https://dreambooth.github.io/

coldtea 1050 days ago

>You can't train on decolorized images, because the neural network needs to learn how to predict color with a black and white image as context. Without color info, the model can't learn.

I think the parent means with delocorized images used to test the success and guide the training (since they can be readily compared with the colored image they resulted from which would be the perfect result).

Not to use decolorized images alone to train for coloring (which doesn't even make sense).

omoikane 1050 days ago

Is there a reason for using LAB as opposed to YCbCr? My understanding is that YCbCr is another model that separates luma (Y) from chroma (Cb and Cr), but JPEG uses YCbCr natively, so I wonder if there would be any advantage in using that instead of LAB?

TylerE 1050 days ago

The Y in YCbCr is linear, and is just a grayscale image. The L channel in lab is non-linear (as are A and B), and is a complex transfer function designed to mimic the response of the human eye.

A YCbCr colorspace is directly mapped from RGB, and thus is limited to that gamut.

LAB can encode colors brighter than diffuse white (ala #ffffff), like an outdoor scene in direct sunlight.

Sorta HDR (LAB) vs non-HDR (YCbCr).

This image (https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Ex...) is a good demo, left side was processed in LAB, right in YCbCr). Even reduced back down to a jpeg, the left side is obviously more lifelike, since the highlights and tones were preserved until much later in processing pipeline.

aendruk 1049 days ago

The description included with that image conflicts with your account:

> An example of color enhancement using LAB colorspace in Photoshop (CIELAB D50). Left side is enhanced, right side is not. Enhancement is "overdone" to show the effect better.

And per the original upload the “enhancement” demonstrated is linear compression of the a* and b* channels—

https://upload.wikimedia.org/wikipedia/commons/archive/f/f3/...

—the effect a divergence from the likeness of life at least as I’ve experienced it.

atorodius 1050 days ago

You can take arbitrary images and convert them to grayscale for training, and do conditional diffusion

bemusedthrow75 1050 days ago

But convert them to grayscale how?

Black and white film doesn't have one single colour sensitivity. Play around with something like DxO FilmPack sometime (it has excellent measurement-based representations of black and white film stocks).

It's a much more complex problem than it might seem on the surface.

atorodius 1050 days ago

fair, but can’t you just randomize the grayscale generation for training?

bemusedthrow75 1049 days ago

I wanted to say no, that can't work.

And I think it can't work. But now I am not sure!

The other day I was working on a mono photo to prove a point: that a model (a photographic artist's model!) with very striking pink hair was of little concern to a photographer who worked in black and white only, and might actually present some opportunities for choosing tonal separation that are not present in those with non-tinted hair.

In different circumstances (film and filter) her hair could appear (in black and white) to the viewer as if it was likely brunette or likely blonde, before any local (as opposed to image wide) adjustments were made.

The question you are asking, I think, is could you get the hair colour right based on the impact of those same circumstances on other known objects in the scene.

I think the answer is no, in the main, generally because those objects likely don't survive to make colour comparisons from (and there are known cases where the colourisation of a building has been completely wrong because it had simply been repainted). And also because it's sometimes not even obvious what a structure actually is, without its colour. People who colourise by hand make this mistake too.

But I concede that given that we have to work with contemporary images to have a colour source, randomising the tone curve is the only thing that could work.